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“What model are you considering?” “I am not considering one — I am using analysis 
of variance.” 


N. R. Draper and H. Smith 


The analysis of variance is (not a mathematical theorem but) a simple method of arrang- 
ing arithmetical facts so as to isolate and display the essential features of a body of data 
with the utmost simplicity. 


Sir Ronald A. Fisher 


No aphorism is more frequently repeated in connection with field trials, than that we 
must ask Nature few questions or, ideally, one question, at atime. The writer is convinced 
that this view is wholly mistaken. Nature, he suggests, will best respond to a logical 
and carefully thought out questionnaire; indeed, if we ask her a single question, she will 
often refuse to answer until some other topic has been discussed. 


Sir Ronald A. Fisher 


The statistician is no longer an alchemist expected to produce gold from any worthless 
material offered him. He is more like a chemist capable of assaying exactly how much 
of value it contains, and capable also of extracting this amount, and no more. In these 
circumstances, it would be foolish to commend a statistician because his results are 
precise or to reprove because they are not. If he is competent in his craft, the value of 
the result follows solely from the value of the material given him. It contains so much 
information and no more. His job is only to produce what it contains. 


Sir Ronald A. Fisher 


The new methods occupy an altogether higher plane than that in which ordinary statistics 
and simple averages move and have their being. Unfortunately, the ideas of which they 
treat, and still more the many technical phrases employed in them, are as yet unfamiliar. 
The arithmetic they require is laborious, and the mathematical investigations on which 
the arithmetic rests are difficult reading even for experts . . . this new departure in science 
makes its appearance under conditions that are unfavourable to its speedy recognition, 
and those who labour in it must abide for some time in patience before they can receive 
sympathy from the outside world. 


Sir Francis F. Galton 


When statistical data are collected as natural observations, the most sensible assumptions 
about the relevant statistical model have to be inserted. In controlled experimentation, 
however, randomness could be introduced deliberately into the design, so that any sys- 
tematic variability other than [that] due to imposed treatments could be eliminated. 


The second principle Fisher introduced naturally went with the first. With statistical 
analysis geared to the design, all variability not ascribed to the influence of treatments 
did not have to inflate the random error. With equal numbers of replications for the 
treatment, each replication could be contained in a distinct block, and only variability 
among plots in the same block were a source of error — that between blocks could be 
removed. 


Sir Maurice S. Bartlett 


Preface 


The analysis of variance (ANOVA) models have become one of the most widely 
used tools of modern statistics for analyzing multifactor data. The ANOVA 
models provide versatile statistical tools for studying the relationship between 
a dependent variable and one or more independent variables. The ANOVA mod- 
els are employed to determine whether different variables interact and which 
factors or factor combinations are most important. They are appealing because 
they provide a conceptually simple technique for investigating statistical rela- 
tionships among different independent variables known as factors. 

Currently there are several texts and monographs available on the sub- 
ject. However, some of them such as those of Scheffé (1959) and Fisher and 
McDonald (1978), are written for mathematically advanced readers, requiring 
a good background in calculus, matrix algebra, and statistical theory; whereas 
others such as Guenther (1964), Huitson (1971), and Dunn and Clark (1987), 
although they assume only a background in elementary algebra and statistics, 
treat the subject somewhat scantily and provide only a superficial discussion of 
the random and mixed effects analysis of variance. 

This book has been designed to bridge this gap. It provides a thorough and 
elementary discussion of the commonly employed analysis of variance mod- 
els with emphasis on intelligent applications of the methods. We have tried 
to present a logical development of the subject covering both the assump- 
tions made and methodological and computational details of the techniques 
involved. Most of the important results on estimation and hypothesis test- 
ing related to analysis of variance models have been included. An attempt 
has been made to present as many necessary concepts, principles, and tech- 
niques as possible without resorting to the use of advanced mathematics and 
statistical theory. In addition, the book contains complete citations of most 
of the important and related works, including an up-to-date and comprehen- 
sive bibliography of the field. A notable feature of the presentation is that the 
fixed, random, and mixed effect analysis of variance models are treated in 
tandem. 

No attempt has been made to present the theoretical derivations of the results 
and techniques employed in the text. In such cases the reader is referred to the 
appropriate sources where such discussions can be found. It is hoped that the 
inquisitive reader will go through these sources to get a thorough grounding in 
the theory involved. However, whenever considered appropriate, some elemen- 
tary derivations involving results on expectations of mean squares have been 
included. 
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The computational formulae needed to perform analysis of variance calcula- 
tion are presented in full detail in order to facilitate the requisite computations 
using a handheld scientific calculator. Many modern electronic calculators are 
sufficiently powerful to handle complex arithmetic and algebraic computations 
and can be readily employed for this task. We are of the opinion that researchers 
and scientists should clearly understand a procedure before using it and manual 
computations provide a better understanding of the working of a procedure as 
well as any limitations of the experimental data. In addition, a separate chapter 
has been included to describe the use of some well-known statistical packages 
to perform a computer-assisted analysis of variance and specific commands rel- 
evant to individual analysis of variance models are included. Each chapter con- 
tains a number of worked examples where both manual and computer-assisted 
analysis of variance computations are illustrated in complete detail. 

The only prerequisite for the understanding of the material is a preparation in 
a precalculus introductory course in statistical inference with special emphasis 
on the principles of estimation and hypothesis testing. Although some of the 
statistical concepts and principles used in the text (e.g., maximum likelihood 
and minimum variance unbiased estimation, and other related concepts and 
principles) may not be familiar to students who have not taken any intermediate 
and advanced level courses in statistical inference, these concepts have been 
included for the sake of completeness and to enhance the reference value of 
the text. However, the use of such results is generally incidental, without any 
mathematical and technical formality, and can be skipped without any loss 
in continuity. In addition, most of the results of this nature are usually kept 
out of the main body of the text and are generally indicated under a remark 
or following a footnote. Moreover, many important results in probability and 
statistics useful for understanding the analysis of variance models have been 
included in the appendices. 

The book can be employed as a textbook in an introductory analysis of 
variance course for students whose specialization is not statistics, such as those 
in biological, social, engineering, and management sciences but nevertheless 
use analysis of variance quite extensively in their work. It can also be used as 
a supplement for a theoretical course in analysis of variance to balance and 
complement the theoretical aspects of the subject with practical applications. 
The book contains ample discussion and references to many theoretical results 
and will be immensely useful to students with advanced training in statistics. 
The investigators concerned with the analysis of variance techniques of data 
analysis will also find it a useful source of reference for many important results 
and applications. 

Inasmuch as the principles and procedures of the analysis of variance are 
fairly general and common to all academic disciplines, the book can be em- 
ployed as a text in all curricula. Although the examples and exercises are 
drawn primarily from behavioral, biological, engineering, and management 
sciences, the illustrations and interpretations are relevant to all disciplines. This 
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underscores the interdisciplinary nature of research problems from all substan- 
tive fields and the absolute generality and discipline-free nature of statistical 
theory and methodology. Examples and exercises, in addition to assessing basic 
definitional and computational skills, are designed to illustrate basic conceptual 
understanding including applications and interpretations. 

The textbook contains an abundance of footnotes and remarks. They are 
intended for statistically sophisticated readers who wish to pursue the subject 
matter in greater depth and it is not necessary that the student beginning an 
analysis of variance course read them. They often expand and elaborate on 
a particular theme, point the way to generalization and to other techniques, 
and make historical comments and remarks. In addition, they contain literature 
citations for further exploration of the topic and refer to finer points of theory and 
methods. We are confident that this approach will be pedagogically appealing 
and useful to readers with a higher degree of scholarly interest. 

Finally, there is something to be said concerning the use of real-life data. We 
certainly believe that real-life examples are important as motivational devices 
for students to convince them that the techniques are indeed used in substantive 
fields of research; and, whenever possible, we have tried to make use of data 
from actual experiments and studies reported in books and papers by other 
authors. However, real-life data that are easy to describe, without requiring too 
much time and space, and are helpful in illustrating a particular technique are not 
always easy to find. Thus, we have included many examples and exercises that 
are realistically constructed using hypothetical data in order to fit the illustration 
of a particular statistical model under consideration. We believe that in many 
instances the use of such “artificial” data is just as instructive and motivating as 
the “‘real’’ data. The reader interested in working through some more examples 
and exercises involving real-data sets is referred to the books by Andrews and 
Herzberg (1985) and Hand et al. (1993). 
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Introduction 


1.0 PREVIEW 


The variation among physical observations is a common characteristic of all 
scientific measurements. This property of observations, that is, their failure to 
reproduce themselves exactly, arises from the necessity of taking the observa- 
tions under different conditions. Thus, in a given experiment, readings may 
have to be taken by different persons at different periods of time or under dif- 
ferent operating or experimental conditions. For example, there may be a large 
number of external conditions over which the experimenter has no control. 
Many of these uncontrolled external conditions may not affect the results of the 
experiment to any significant degree. However, some of them may change the 
outcome of the experiment appreciably. Such external conditions are commonly 
known as the factors. 

The analysis of variance methodology is concerned with the investigation of 
the factors likely to contribute significant effects, by suitable choice of exper- 
iments. It is a technique by which variations associated with different factors 
or defined sources may be isolated and estimated. The procedure involves the 
division of total observed variation in the data into individual components at- 
tributable to various factors and those due to random or chance fluctuation, and 
performing tests of significance to determine which factors influence the ex- 
periment. The methodology was originally developed by Sir Ronald A. Fisher 
(1918, 1925, 1935) who gave it the name of “analysis of variance.” The analy- 
sis of variance is the most widely used tool of modern (post-1950) statistics by 
research workers in the substantive fields of biology, psychology, sociology, ed- 
ucation, agriculture, engineering, and so forth. The demand for knowledge and 
development of this topic has largely come from the aforementioned substan- 
tive fields. The development of analysis of variance methodology has in turn 
affected and influenced the types of experimental research being carried out in 
many fields. For example, in quantitative genetics which relies extensively on 
separating the total variation into environmental and genetic components, many 
of the concepts are directly linked to the principles and procedures of the anal- 
ysis of variance. Nowadays, the analysis of variance models are widely used to 
analyze the effects of the independent variables under study on the dependent 
variable or response measure of interest. 

The general synthesis of the analysis of variance procedure can be sum- 
marized as follows. Given a collection of n observations y;’s, we define the 
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aggregate variation, called the total sum of squares, by 


S01 - dY, 
i=l 


where 


Then the technique consists of partitioning the total sum of squares into compo- 
nent variations due to different factors also called sums of squares. For example, 
suppose there are Q such factors. Then the total sum of squares (SS7) is parti- 
tioned as 


SS7 = SS, +SS, +---+SSo, 


where SS,4, SSz,..., and SSg represent the sums of squares associated with 
factors A, B,..., and Q, respectively and which account in some sense for 
the variation that can be attributed to these factors or sources of variation. The 
experiment is so designed that each sum of squares reflects the effect of Just 
one factor or that of the random error attributed to chance. Furthermore, all 
such sums of squares are made comparable by dividing each by an appropriate 
number known as the degrees of freedom.! The quantities obtained by dividing 
each sum of squares by the corresponding degrees of freedom are called mean 
squares. The mean squares and other relevant quantities provide the basis for 
making statistical inference either in terms of a test of hypothesis or point and 
interval estimation. For example, the random distribution of the appropriate 
ratio of a pair of mean squares often permits certain tests to be made about the 
effects of the factors involved. In this way, we can decide whether the apparent 
effect of any one factor is readily explainable purely by chance. 

The use of the term “analysis of variance” to describe the statistical method- 
ology dealing with the means of a variable across groups of observations is not 
wholly satisfactory. The term seems to have its origin in the work of Fisher 
(1918) when he partitioned the total variance of a human attribute into com- 
ponents attributed to heredity, environment, and other factors; this led to an 
equation 


2 2 2 2 
O° =O; +0, +---+0;, 
where o°% is the total variance and o?’s are variances associated with different 


| The degrees of freedom designate the number of statistically independent quantities (response 
variables) that comprise a sum of square (see Section 2.1 for further details). 
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factors. In such a case, the term “analysis of variance” is entirely appropriate. 
However, as Kempthorne (1976) states, this is rather a limited view of the entire 
statistical methodology falling under the nomen of “analysis of variance.” 


1.1 HISTORICAL DEVELOPMENTS 


Credit for much of the early developments in the field of the analysis of variance 
goes to Sir Ronald Aylmer Fisher? (1890—1962) who originally developed it in 
the 1920’s.* He was the pioneer and innovator of the uses and applications of 
statistical methods in experimental design. Shortly after the end of World War 
I, Fisher resigned a public school teaching job and accepted a position at the 
Statistical Laboratory of the Rothamsted Agricultural Experimental Station in 
Harpenden, England. The center was heavily engaged in agricultural research 
and for many years Fisher was in charge of statistical data analysis there. He 
developed and employed the analysis of variance as the principal method of 
data analysis in experimental design. Frank Yates was Fisher’s coworker at 
Rothamsted, and both of them collaborated on many important research projects. 
Yates also was primarily responsible for making many early contributions to 
the literature on analysis of variance and experimental design. Moreover, both 
Fisher and Yates made numerous indirect contributions through the staff of the 
Rothmsted Experimental Station. 

On the retirement of Karl Pearson in 1933, Fisher was appointed Galton Pro- 
fessor at the University of London. Later, he moved to the faculty of Cambridge 
University to accept the Arthur Balfour Chair created by an endowment from 
the great founder of the science of eugenics. He also traveled abroad widely 
and held visiting professorships at several universities throughout the world. In 
1936, Fisher visited the United States and received an honorary degree from 
the Harvard University on the occasion of its Tercentenary Celebrations. Early 
in 1937, he accepted the Honorary Fellowship of the Indian Statistical Insti- 
tute and an invitation to preside over the first session of the Indian Statistical 
Congress held in January, 1938. In 1935, Fisher published a book, The Design of 
Experiments, in which logical and theoretical principles of experimental design 
were developed and expounded with a great variety of illustrative examples. 
The book has gone through numerous editions and has become a classic of 
statistical literature. 

Fisher developed and used analysis of variance mainly during the 1920s and 
1930s (see Fisher (1918, 1925, 1935)) for the study of data from agricultural ex- 
periments. From the beginning, Fisher employed both fixed effects and random 


* Kendall et al. (1983, p. 2) point out that it would be more appropriate to call it “analysis of sum 
of squares,” “but history and brevity are against this logical usage.” 

3 The brief biographical sketch of Fisher given here has been drawn, plagiaristically in some ways, 
from Mahalanobis (1962). | 

4 There had been some early work on analysis of variance carried out by W. H. R. A. Lexis and 
T. N. Thiele in the late nineteenth century (Kendal et al. (1983, p. 2)). 
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effects models (the latter, at least, when he treated intraclass correlations), and 
alluded to what has later been designated a “mixed model” in his book The De- 
sign of Experiments. The useful “analysis of variance table,’ including sum of 
squares, degrees of freedom, and mean squares for various sources of variation, 
was first published in a paper by Fisher and Mackenzie (1923). Tippett (1931) 
appears to have been the first to include a column of expected mean squares for 
variance components and to estimate these components. 

Eisenhart (1947a) seems to have originated the terms “Model I’ and “Model 
II” for fixed effects and random effects models, and he also mentions the pos- 
sibility of a mixed model. Mixed models have been designated as Model III 
(see, e.g., Ostle and Malone (1988)). Some authors use Model III to denote 
random effects models in which random effects are drawn from a finite pop- 
ulation of possible values (see, e.g., Dunn and Clark (1974, Chapter 9)). We 
have referred to these models as finite population models (see Chapter 9). Prior 
to Eisenhart (1947a), more formal treatments of the models had appeared in the 
papers by Daniels (1939) and Crump (1946) regarding random effects models, 
and by Jackson (1939) regarding the mixed model. However, a more complete 
treatment of the mixed model did not appear until Scheffé (1956a). The devel- 
opment of various models was very intensive in the 1940s and 1950s. These 
early developments in this field can be found in expository articles by Crump 
(1951) and Plackett (1960). 

Most of the early applications of analysis of variance were in the field of 
agricultural sciences. Fisher’s application of statistical theory in agricultural 
experiments brought many new developments and advances in the field. In fact, 
much of modern statistics originated to meet the research needs of agriculture 
experimental stations, and it is to this legacy that much of the terminology in 
the field is derived from agricultural experimentations. However, many of the 
experimental design terms such as “treatment,” “plot,” and “block” used earlier 
in an agricultural context, have lost their original meaning and are nowadays 
used in all areas of research. 

Today, the methods of experimental design and analysis of variance are com- 
monly used in nearly all fields of study and scientific investigation. Some of 
the disciplines where statistical design and analysis of experiments are rou- 
tinely used include agriculture, biology, medicine, health, physical sciences, 
engineering, education, and social and behavioral sciences. 


1.2 ANALYSIS OF VARIANCE MODELS 


It is assumed that an analysis of variance model for the observations can be 
approximated by linear combinations (functions) of certain unobservable quan- 
tities known as “effects” corresponding to each factor of the experiment. The 
effects may be of two kinds: systematic or random. If the effect is systematic, 
it is called a fixed effect or Model I effect; otherwise it is called a random 
effect or Model II effect. The equation expressing the observations as a linear 
combination of the effects is known as a linear model. 
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In every linear model there is at least one set of random effects, equal in 
number to the number of observations, a different one of which appears in 
every observation. This is called the residual effect or the error term. Further- 
more, there is usually one fixed effect that appears in every model equation 
and is known as the general constant and is the mean, in some sense, of all the 
observations. Thus, a general linear model is represented as? 


Vijk.g = M+; + Bj + VE +++ + Cijk..gs (1.2.1) 


where Yyijx...g is the observed score, —oo < ys < cols the overall mean common 
to all the observations, a;, Bj, yx, ... are unobservable effects attributable to 
different factors or sources of variation, and e;;,...g 1s an unobservable random 
error associated with the observation y;jx,..¢ and is assumed to be independently 
distributed with mean zero and variance a2. Model (1.2.1) is called a fixed ef- 
fects model, or Model I if the random effects in the model equation are only 
the error terms. Thus, under a fixed effects model, the quantities @;, Bj, ¥x,..- 
are assumed to be constants. The objective in a fixed effects model is to make 
inferences about the unknown parameters 1, a@;, Bj, Ye, ---, and o2. It is called 
a random effects model or a variance components model or Model II if all the 
effects in the model equation except the additive constant are random effects. 
Thus, under arandom effects model, the quantities a;, Bj, yz, ... are assumed to 
be random variables with means of zero and variances 0, re a; ..., respec- 
tively. The objective in a random effect model is to make inferences about the 
variances 02, OR oF, ..., 02 and/or certain functions of them. A case falling 
under none of these categories is called a mixed model or Model III. Thus, in 
a mixed model, some of the effects in the model equation are fixed and some 
are random. Mixed models contain a mixture of fixed and random effects and 
therefore represent a blend of the fixed and the random effects models. Mixed 
models include, as special cases, the fixed effects model in which all effects are 
assumed fixed except the error terms, and the random effects model in which 
all effects are assumed random except the general constant. The objective in a 
mixed model is to make inferences about the fixed effect parameters and the 
variances of the random effects. There are widespread applications of such 
models in a variety of substantive fields including genetics, animal husbandry, 
social sciences, and engineering. 

Throughout this volume we consider analysis of variance based on Models I, 
II, and III. These are the most widely applicable models although other models 
have also been proposed (Tukey (1949a)). 


> Ina linear model it is customary to use a lower or an upper case Roman letter to represent a random 
effect and a Greek letter to designate a fixed effect. However, the practice is not universal, and 
some authors do just the opposite; that is, they use Greek letters to represent random effects and 
Roman letters to designate fixed effects (see, e.g., Kempthorne and Folks (1971, pp. 456-470)). 
In order to keep our notations simple and uniform, we use Greek letters to represent both fixed 
and random effects except the error or residual term which is denoted by the lower case Roman 
letter (e). 
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1.3. CONCEPT OF FIXED AND RANDOM EFFECTS 


Whether an effect should be considered fixed or random will depend on the 
way the experimental treatments (levels of a factor) are selected and the kind 
of inferences one wishes to make from the analysis. In the fixed effects case, 
the experimenter is working with a systematically chosen set of treatments or 
levels of factors that are of particular intrinsic interest, and the inferences are 
to be made only about differences among the treatments actually being studied 
and about no other treatments that might have been included. 

Thus, in the case of fixed effects, in advance of the actual experiment, the 
experimenter decides that he wants to see if differences in effects exist among 
some predetermined set of treatments or treatment combinations. His interest 
lies in these treatments or treatment combinations and no others. That is, treat- 
ments are chosen a priori by the researcher because they are of special interest 
to him. Each treatment of practicable interest to the experimenter has been in- 
cluded in the experiment, and the set of treatments or treatment combinations 
included in the experiment cover the entire set of treatments about which the 
experimenter wants to make inferences. The effect of any treatment is “fixed”’ 
in the sense that it must appear in any new trial of the experiment on other 
subjects or experimental units. Thus, model terms that represent treatments, 
blocks, and interactions are parameters. 

An example of fixed effects may be an experiment in which the effects of 
different drugs on groups of animals are examined. Here, the treatments (drugs) 
are fixed and determined by the experimenter, and the interest being in the results 
of the treatments and the differences between them. This is also the case when 
we test the effects of different levels (doses) of a given factor such as a chemical 
or the amount of light to which a plant has been exposed. Another example of 
fixed effects would be a study of body weights for several age groups of animals. 
The treatments (levels) would be age groups that are fixed. Other sets of factors 
or variables that are usually considered fixed are types of disease, treatment 
therapy, gender, marital and socioeconomic status, and so forth. 

In the random effects case, the experimenter is working with a randomly 
selected subset of treatments from a much larger population of all possible 
treatments about which the experimenter wants to make inferences. The subset 
of treatments included in the experiment do not exhaust the set of all possible 
treatments of interest. Thus, the treatment levels at which the experiment is 
conducted are not of interest in themselves, but rather they represent some 
of the many treatments on which the experiment could have been performed. 
Here, the effect of the treatment is not regarded as fixed, since any particular 
treatment itself need not be included each time the experiment is carried out. In 
such a case, in any repetition of the experiment a new sample of treatments is to 
be included. The experimenter may not actually plan to repeat the experiment, 
but conceptually each repetition involves a fresh sample of treatments. The 
interest of the researcher in such situations lies in determining whether different 
treatment levels yield different responses. 
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For an example of random effects, suppose that in a psychological experiment 
the personality of the experimenter herself may have an effect on the results. 
There may be a large group of people, each presumably having a distinct person- 
ality, who might possibly serve as experimenters. Trying out each such person 
in the experiment would not be practically feasible. So, instead, one draws a 
random sample from some chosen population of potential experimenters. In 
this case, each experimenter constitutes an experimental treatment given to one 
group of subjects assigned to her at random. Since the experimental treatments 
employed are themselves a random sample and inferences are to be made about 
experimenter effects extending to the population of potential experimenters, the 
effects are regarded as random. Other examples of factors or variables that are 
usually considered random are animals, days, subjects, plots, and so forth. 

It is sometimes difficult to decide whether a given factor is fixed or ran- 
dom. The main distinction depends on whether the levels of the factor can be 
considered a random sample of a large collection of such levels of interest, 
or are fixed treatments whose differences the investigator wishes to investi- 
gate. Moreover, in many experimental studies involving the levels of a random 
factor, the researcher does not actually select a random sample from a large 
population of all the levels of interest. For example, animals, subjects, days, 
and the like, being studied are the ones that happen to be conveniently available. 
The usual assumption in such a situation 1s that the natural process leading to 
the availability of such cases is a random one involving no systematic bias and 
the cases being studied are sufficiently representative of their class type. How- 
ever, if there are reasons to doubt the representativeness of the sample being 
studied, the estimation process will be biased, raising serious questions about 
the validity of the results. 


1.4 FINITE AND INFINITE POPULATIONS 


As mentioned earlier, in the case of random effects, treatments included in the 
experiment are usually assumed to be a random sample from a population of all 
possible treatments of interest. Such populations are generally considered to be 
of infinite size. However, in the definition of random effects it is not necessary 
to require that the population be of infinite size; it also could be finite. For 
example, a set of similar machines may be used for certain operations but 
measurements are obtained from only a limited number of them. In the case of 
a finite population, however, the population may be very large so that for all 
practical purposes the population may be considered to be infinite. 

Thus, while considering random effects, one must distinguish between two 
cases: when the population is finite and when it is infinite — either because 
it is really so or because the finite population is sufficiently large so that for 
all practical purposes, it can be considered as infinite. In this volume we are 
primarily concerned with random effects arising out of infinite populations. 
These are probably the most common situations in many real-life applications. 
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The results dealing with the so-called finite population theory are treated only 
briefly. In the case of the so-called finite population theory, the analysis of 
variance is performed in the standard way but the results on expected mean 
squares are different. Knowing the expected mean squares, one can then decide 
which of the ratios should be used to test for the statistical significance of the 
factors or sources of variation of interest. The details on analyses and results 
covering finite population situations can be found in the works of Bennett and 
Franklin (1954, p. 404), McHugh and Mielke (1968), Gaylor and Hartwell 
(1969), Searle and Fawcett (1970), and Sahai (1974a). 


1.5 GENERAL AND GENERALIZED LINEAR MODELS 


In this volume and in many other books dealing with the analysis of variance 
and regression models, much of the theory and methodology being described 
makes the fundamental assumption of the normality of the error terms. The 
analysis of variance models along with the linear regression models are com- 
monly known as the linear (statistical) models. The linear (statistical) models 
considered in this book are just one particular formulation of a model called 
the general linear model which presents a unified treatment of regression and 
analysis of variance models. A comprehensive treatment of statistical analysis 
based on general linear models can be found in Graybill (1961, 1976), Searle 
(1971b, 1987), Hocking (1985, 1996), Littell et al. (1991), Wang and Chow 
(1994), Rao and Toutenburg (1995), Christensen (1996), and Neter et al. (1990, 
1996). Furthermore, in recent years, considerable effort has been devoted to the 
development of a wider class of linear models encompassing other probability 
distributions for their error structure. For example, in many medical and epi- 
demiological works, a common type of outcome measure is a binary response 
and the mean response entails the binomial parameter. Similarly, in many health 
studies, the response variable is the observed number of cases from a certain rare 
disease, and, thus, the error structure has a Poisson distribution. Furthermore, 
in many environmental studies, the response variable often follows a gamma 
or inverse Gaussian distribution. The linear (statistical) models that allow for 
theory and methodology to be applicable to a much more general class of linear 
models, of which the normal theory is a special case, are known as generalized 
linear models. A considerable body of literature has been developed for these 
models and the interested reader is referred to the books by McCullagh and 
Nelder (1983, 1989), Dobson (1990), and Hinkley et al. (1991). 


1.6 SCOPE OF THE BOOK 


In this volume, we consider univariate analysis of variance models, that is, 
models with a single response variate. However, many applications of analysis 
of variance models involve simultaneous measurements on several correlated 
response variates and for statistical analysis of multivariate response data one 
uses multivariate analysis of variance, which generalizes the analysis of variance 
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procedure for univariate normal populations to multivariate normal populations. 
The analysis of variance models with multivariate response are discussed in the 
works of Krishnaiah (1980), Anderson (1984), Morrison (1990), and Lindman 
(1992), among others. Similarly, statistical methods presented in this book 
are based on normal theory assumption for inference problems. The nonpara- 
metric analysis of variance procedures based on ranks, which do not require 
normal theory assumption, are not considered. There is a significant body of 
literature on nonparametric analysis of variance and the interested reader 1s re- 
ferred to Conover (1971), Lehmann (1975), Daniel (1990), Sprent (1997), and 
Hollander and Wolfe (1998), among others. Finally, the main focus of this book 
is on classical analysis of variance procedures, where parameters are assumed 
to be unknown constants, and the accepted statistical techniques are based on 
the frequentist theory of hypothesis testing developed by Neyman and Pearson. 
The impact of the frequentist approach is reflected in the established use of 
type I and type II errors, and p-values and confidence intervals for statistical 
analysis. In contrast, there is a growing body of literature on the use of Bayesian 
theory of statistical inference in linear modes. In the Bayesian approach, all pa- 
rameters are regarded as “random” in the sense that all uncertainty about them 
should be expressed in terms of a probability distribution. The basic paradigm 
of Bayesian statistics involves a choice of a joint prior distribution of all parame- 
ters of interest that could be based on objective evidence or subjective judgment 
or a combination of both. Evidence from experimental data 1s summarized by a 
likelihood function, and the joint prior distribution multiplied by the likelihood 
function is the (unnormalized) joint posterior density. The final (normalized) 
joint posterior distribution and its marginals form the basis of all Bayesian in- 
ference (see, e.g., Lee (1997)). The Bayesian method frequently provides more 
information and makes inference more readily attainable than the traditional 
frequentist approach. A drawback of the Bayesian approach ts that it tends to be 
computationally intensive and, even for simple unbalanced mixed models, the 
computation of the joint posterior density of the parameters and its marginals in- 
volves high-dimensional integrals. Fortunately, as a result of recent advances in 
computing hardware and numerical algorithms, Bayesian methods have gained 
wide applicability, and the scope and complexity of Bayesian applications have 
greatly increased. Readers interested in the Bayesian approach to analysis of 
variance problems are referred to Box and Tiao (1973), Broemeling (1985), 
Schervish (1992), and Searle et al. (1992), among others. 
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2.0 PREVIEW 


In this chapter we consider the analysis of variance associated with experiments 
having only one factor or experimental variable. Such an experimental layout 
is commonly known as one-way classification in which sample observations 
are classified (grouped) by only a single criterion. It provides the simplest 
data structure containing one or more observations at every level of a single 
factor. One-way classification is a very useful model in statistics. Many complex 
experimental situations can often be considered as one-way classification. In the 
succeeding chapters, we discuss situations involving two or more experimental 
variables. 


2.1 MATHEMATICAL MODEL 


Consider an experiment having a treatment groups or a different levels of a 
single factor. Suppose n observations have been made at each level giving a 
total of N = an observations. Let y;; be the observed score corresponding to 
the j-th observation at the i-th level or treatment group. The analysis of variance 
model for this experiment is given as 


yjuwuetat+e; (G=1,...,a;j =1,...,n), (2.1.1) 


where —0o < ft < 001s the general or overall mean (true grand mean) common 
to all the observations, a; is the effect due to the i-th level of the factor and 
e;; 1s the random error associated with the j-th observation, at the i-th level or 
treatment group. 

Model (2.1.1) states that the score for observation j at level 7 1s based on 
the sum of three components: the true grand mean yp of all the treatment 
populations, the effect w; associated with the particular treatment i, and a third 
part e;; which is strictly peculiar to the j-th observation made under the i-th 
level or treatment group. The term e;; takes into account all those factors that 
have not been included in model (2.1.1). 


2.2 ASSUMPTIONS OF THE MODEL 


Before one can use model (2.1.1) to make inferences about the existence of 
effects, certain assumptions must be made: 
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(i) The errors e;;’s are assumed to be randomly distributed with mean zero 
and common variance 02. 
(ii) The errors associated with any pair of observations are assumed to be 


uncorrelated; that is, 


iZ’, GAL 
E(éij €;';') = 0 (2.2.1) 
isi jes’ 


(iii) Under Model I, the effects w;’s are assumed to be fixed constants subject 
to the constraint that )°5_, a; = 0. This implies that the observations 
yij’s are distributed with mean jz + a; and common variance a2. 

(iv) Under Model II, the effects a;’s are also assumed to be randomly dis- 
tributed with mean zero and variance Ge. Furthermore, a@;’s are uncorre- 
lated with each other and each of the @;’s and e;;’s are also uncorrelated, 


that is, 
E(ajav)=0, ii’ (2.2.2) 
and 


E(ajei;j) =0, for alli’s and j’s. 


Then from model (2.1.1), we have o? = o2 + G2 and so of and o7 


are components of o?, the variance of an observation. Hence, a2 and 


o? are called “components of variance” (see Appendix L). This implies 
that the observations y;;’s are distributed with mean and common 


variance o2 + o?. 


Remarks: (i) Under Model I, the assumption that }°;_, a; = 0 can be made without 
any loss of generality since if )“;_, @; were equal to some nonzero constant c, 4 could 
be replaced by 4 +c and }*;_, a; would then be equal to zero. Similarly, e;;’s could be 
assumed to have zero mean. Under Model II, as in the case of the fixed effects model, 
the assumption of a zero mean can also be made without any loss of generality. In this 
case, however, we do not assume that ys a; = 0 since the a;’s are randomly selected 
from a population theoretically of infinite size, and their sum need not be zero. Instead 
we have E(@?) = o7 and E(a;a;) = 0 fori 41’. 

(ii) Figure 2.1 is a schematic representation of the levels of a factor with a levels, 
under the assumptions of the fixed and random effects one-way analysis of variance 
model. Under the fixed effects (Model I) case, jz), U2,..., fq are the means of pres- 
elected subpopulations fixed by the design. Under the random effects (Model II) case, 
[1, 42,» ++ fg are the means of a randomly selected subpopulations from a population 
with mean yw and variance o?. 
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1st group 2nd group ' ath group 
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N(1,,62) N(p,,6; ) N(u,,02) 


1st group 2nd group a _ ath group 


N(Hy,0, +62) N(H).65 + o.) N(u,,02 +62) 


Model Il 


FIGURE 2.1 Schematic Representation of Model I and Model II. 


(iii) One-way classification can be regarded as a one-way nested classification where 
a factor corresponding to “replications” or “samples” is nested within the levels of 
the treatment factor. Such a layout is often termed a two-stage nested design since it 
involves random sampling performed in two stages. One-way random effects models fre- 
quently arise in experiments involving a two-stage nested design. Some examples are as 
follows: 


(a) Asamplea of batches of toothpaste is selected at random from a production pro- 
cess comprising a large number of batches and a chemical analysis to determine 
its composition is performed on n samples taken from each batch. 

(b) A sample a of tobacco leaves is selected from a shipment batch and a chemical 
analysis is performed onn samples taken from each leaf to determine the nicotine 
and tar content. 

(c) A sample a of blocks is selected from a city having a large number of blocks 
and an interview is conducted on a sample of n individuals from each block to 
determine their voting preferences. 

(d) A sample a of bales is selected from a shipment of bales and an analysis to 
determine its content purity is performed on n cores taken from each bale. 
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2.3 PARTITION OF THE TOTAL SUM OF SQUARES 


Using the notation that a bar on the top and a dot in place of a suffix means that 
the particular suffix has been averaged out over the appropriate observations, 
we write the following identity 


vig — ¥. = (Vi. — YD + Oy — Yi), (2.3.1) 
where 
yi. = yij/n = yi./n (2.3.2) 
j=l 
and 
y= s* yij/an = y_/an. (2.3.3) 


i=1 j=l 


Since it is an algebraic identity, equation (2.3.1) must hold for any value of yj;. 
Squaring both sides of (2.3.1) and summing over i and j, we have 


Y Lou — 5.) = Gi —5./ +L dow ye 


i=l j= i=1. j=) i=l j= 


+2 S* Y6.- — ¥. vi; — Ji). (2.3.4) 


i=l..j= 


Now, note that the cross-product term vanishes; that is, 


> Si. — I) -— i) = YO. =) YOu —¥.)=0 (2.3.5) 
=I j=l = = 


and 


>. -9.P = = 6, Hye (2.3.6) 


i 


Then, equation (2.3.4) simplifies to the following identity! 


> dO -— FP =2 G.-Y 4D VO - HP. 23-7) 
i=1 j=l i=] i=1 j=l 


| For a geometric interpretation of the algebraic partition of the sum of squares given by identity 
(2.3.7), see Kendall et al. (1983, p.11). 
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Equation (2.3.7) states that the sum of the squared deviations of individual 
observations from the overall mean is equal to the sum of squared deviations 
of the group means from the overall or grand mean plus the sum of squared 
deviations of the observations from the group means. The term on the left of 
(2.3.7) is called the total sum of squares and will be abbreviated as SS;. The 
first term to the right of (2.3.7) is called the sum of squares “between groups” 
or “due to treatments” and is abbreviated as SS; and the second term is called 
the sum of squares “within groups” and 1s abbreviated as SSy. The second term 
represents the variation of the individual observations about their own sample 
means and it is sometimes called the error or residual sum of squares. 


Remark: The meaning of the partition of the total sum of squares into between and 
within group sums of squares can be explained as follows. Individual observations in 
any sample will differ from each other or exhibit variability. These observed differences 
among individual observations can be ascribed to specific sources. First, some pairs of 
observations are in different treatment groups and their differences are due either to the 
different treatments, or to chance variation, or to both. The sum of squares between 
groups reflects the contribution of different treatments to intergroup differences as well 
as chance. On the other hand, observations in the same treatment group can differ 
only because of chance variation, since each observation within the group received 
exactly the same treatment. The sum of squares within groups reflects these intragroup 
differences due only to chance variation. Thus, in any sample, two kinds of variability — 
the sum of squares between groups, reflecting the variability due to treatments and 
chance, and the sum of squares within groups reflecting chance variation alone — can be 
isolated. 


2.4 THE CONCEPT OF DEGREES OF FREEDOM 


Before we proceed further, it is time to examine the notion of the degrees of 
freedom. Recall the basic definition of the sample variance S* of a random 


sample y,, y2,-.-, Yn as 


Yo- do@ 


where 
¥= Di yi/n (2.4.2) 
i=1 


and 


pees (2.4.3) 
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From (2.4.1) we note that the quantity S* is based upon a sum of squared devi- 
ations from the sample mean. However, we know that the sum of the deviations 
about the mean, defined by (2.4.3), must be zero; that is, 


3 d; = 0. (2.4.4) 
i=l 


The property (2.4.4) has a very important consequence. For example, suppose 
that in a random sample of n =4, one is to choose all the deviations from the 
mean. For the first deviation, one can guess any number, say, d; = 6. Similarly, 
quite arbitrarily, one can assign two more deviations, say, d2 = —9 and d3 = —7. 
However, when one comes to choose the value of the fourth deviation, one is no 
longer free to take any number one pleases. The value of d4 must be given by 


dy = 0—(d; +a, + d3) 
= 0-(6-—9-—7) 
= 10, 


In short, given the values of n — 1 deviations from the mean, which could be 
any arbitrarily assigned numbers, the value of the last deviation is completely 
determined. Thus, we say that there are n — 1 degrees of freedom for a sam- 
ple variance, reflecting the fact that only n — 1 deviations are “free’’ to be any 
number. Given the value of these “‘free’’ numbers, the last value is automatically 
determined.’ 

To obtain the degrees of freedom associated with different sums of squares, 
we note that since the total sum of squares (SS) is based on an deviations 
yij — ¥..’s of the an observations with one constraint on the deviations, namely, 


a n 


>> > Ou — 9.) =0, (2.4.5) 
1 


i=l j= 


it has an — 1 degrees of freedom. Similarly, since the between group sum of 
squares (SS) is computed from the deviations of the a independent class means 
y;.’s from the overall mean y.., but with one constraint, namely, 


>i. — 5.) = 0, (2.4.6) 
i=] 


hence it has a — 1 degrees of freedom. 
Finally, since the within sum of squares (SSw) involves the deviations of the 
an observations from the a sample means, it has an — a = a(n — 1) degrees 


* For a geometric interpretation of degrees of freedom, see Walker (1940). 
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of freedom. Alternatively, the degrees of freedom corresponding to the within 
groups can be argued as follows. Consider the component of the within sum of 
squares corresponding to the i-th factor level, namely, 


> Ou — HY. (2.4.7) 
j=l 


The expression (2.4.7) 1s the equivalent of a total sum of squares considering 
only the i-th factor level. Hence, there are n — 1 degrees of freedom associated 
with this sum of squares. Since SSyw is a sum of squares comprising component 
sums of squares, the i-th component being given by (2.4.7), the degrees of free- 
dom associated with SSy is the sum of the a component degrees of freedom, 
namely, 


Y= 1) =a(n— 1). (2.4.8) 


i=] 


2.5 MEAN SQUARES AND THEIR EXPECTATIONS 


The next question is how to use the partition of the total sum of squares and the 
corresponding degrees of freedom in making inferences about the existence of 
treatment effects. In the analysis of variance, ordinarily, it is convenient to deal 
with the quantities known as mean squares instead of sums of squares. The mean 
squares are obtained by dividing each sum of squares by the corresponding 
degrees of freedom. We denote the two mean squares, namely, between and 
within by MSz and MSy, respectively. 

Next, we examine the expectations of within and between mean squares. 
From model equation (2.1.1), we obtain 


yi. = Yu + a; + ej;)/n = Wt a; + &;. (2.5.1) 
= 
and 
¥.= uta +%)/a=p+a, +2... (2.5.2) 


t=1 


Substituting the values of y;;, y;,, and y,, from (2.1.1), (2.5.1), and (2.5.2), 
respectively, into the expressions for SSw and SSz defined in (2.3.7), we find 


SSw = s 3 (e;; — 2.) (2.5.3) 
i=1 j=l 
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and 


a 


SSp =n » (a; -& +8 —2@). (2.5.4) 


i=] 
Now, because the e;;’s are uncorrelated and identically distributed with mean 
zero and variance oc, it follows, using the formulae of the variance of the 


e? 


sampling distribution of e;;’s, that 


E(e;,) = 0;, (2.5.5) 

E(é?) = o, /n, (2.5.6) 
and 

E(é’) = o; Jan. (2.5.7) 


It is then a matter of straightforward simplification to derive the expectations 
of mean squares. First, taking the expectation of (2.5.3), we get 


E(SSw) = YE bs di —2e, eit na 
i=] j=1 j=l 
= 3 bs E(e;;) — ne(@)| (2.5.8) 
i=] Lj=1 _ 


On substituting the values of E (e7,) and E (27) from (2.5.5) and (2.5.6), respect- 
ively, into (2.5.8), we find 


a 2 
= 7 ames a 
E(SSy) = ye (ne: n zs ) 


i=] 


= a(n — 1)o? (2.5.9) 


The expectation of MSy is, therefore, given by 


E(MSy) = (—) re (2.5.10) 
a(n — 1) 


Note that result (2.5.10) is true under both Models I and II. To derive the 
expectation of MSpz, we first consider the case of Model I and then that of 
Model IT. First, 1f we restrict ourselves to Model I, then the a;’s are fixed quan- 
tities depending on the particular levels (treatments) selected in the experiment 
with the restriction that @, = O. Then, on taking the expectation of (2.5.4), we 
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get 


E(SSz) =n bs ao +tEY (é,— “| (2.5.11) 
i=1 =] 


I 


by virtue of the fact that the a;’s are constants and the expectation of the cross- 
product term vanishes. 
Now, using results (2.5.6) and (2.5.7), we find that 


EG. -2.Y =) E(@) -aE@) 
i=] i=] 


o2 o2 
ere ys en 
n an 
o2 
= (a — 1)—. (2.5.12) 
n 


Furthermore, substituting (2.5.12) into (2.5.11), we obtain 


E(SSg)=n > a} + (a— 1)o?. (2.5.13) 


i=l 


Finally, the expectation of MSz is given by 
n =. 5 
E(MSsz) = 7 di +oa?. (2.5.14) 


Next, consider the case of Model II when the a;’s are also randomly dis- 
tributed with mean zero and variance o7, and the a;’s are uncorrelated with 
each other and with the e;;’s. It then follows, using the formula for the variance 
of the sampling distribution of the mean of the a;’s, that 

E(a?) =o: (2.5.15) 
and 

E(@y =o, /a. (2.5.16) 
Now, on taking the expectation of (2.5.4), we get 


E(SSg) =n le SS (a; -a)/Y +E 2 (é.= | ; (2.5.17) 
i=l i=l 
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Using results (2.5.15) and (2.5.16), we find that 


E Y (ai —ay= > E(a?) —aE@)y’ 
i=l i=l 


2 
= Di a ee 
= ao, a 


=(a—1)o2. (2.5.18) 


On substituting (2.5.12) and (2.5.18) into (2.5.17), we get 


E(SSg) = nf (a —l)oj +(a- pe 
=n(a—1)o2+4+(a— 1)o?. (2.5.19) 
Finally, the expectation of MSz is given by 
E(MSg) = no? + 02. (2.5.20) 


It is important to recognize that in the derivation of the results on expected 
sums of squares and mean squares, we have not made any distribution assump- 
tions for e;;’s under Model I and for e;;’s as well as @;’s under Model II. 
This fact has important implications while procuring unbiased estimates for the 
parameters under both Models I and II. 


2.6 SAMPLING DISTRIBUTION OF MEAN SQUARES 


The between and within sums of squares or mean squares are functions of the 
sample observations, and thus must have a sampling distribution. However, to 
derive the form of their distributions, we require the assumption of normality for 
the random components of model (2.1.1). Thus, we assume that under Model I, 
the e;;’s are completely independent and are normally distributed with mean 
zero and variance a2. Furthermore, under Model II, the a;’s are also completely 
independent of each other and of the e;;’s, and are normally distributed with 
mean zero and variance 0. 


Now, first of all, we note that for a normal parent population with variance 07 


aww ; (2.6.1) 


where &2 is an unbiased estimator of o? based on v degrees of freedom (see 
Appendix C). Furthermore, under both Models I and II, we have from (2.5.10) 
that 


E(MSyw) = o. (2.6.2) 
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From (2.6.1) and (2.6.2), it follows that 


MSy — x7[a(n — 1], 


o2 a(n — 1) 


(2.6.3) 


that is, the ratio of MSy to o2 is a x*[a(n — 1)] variable divided by a(n — 1). 
To derive the sampling distribution of MSz, however, we have to distinguish 
between the cases of Models I and II. Under Model I, we have from (2.5.14) that 


Ht 5 2 
E(MSz) = oar 2A +o. (2.6.4) 
Therefore, MSg is an unbiased estimator of o2 only when a = 0,i = 1, 
2,...,a; that is, the effects of all the levels are the same. Hence, from (2.6.1) 


and (2.6.4), it follows that when a; =O (i = 1,2...,a) we have 


MSs _ x*[a — 1]. (2.6.5) 


2 = 
0; a—l1 


that is, the ratio of MSz to a? is a x*[a — 1] variable divided by a — 1. It is im- 
portant to note at this point that when the effects of all the levels are not the same, 


MSep x7[a—1,A] 


2 = 
0; a—1 


(2.6.6) 


where x? [a — 1, A] represents a noncentral x7?[a — 1] variable with the non- 
centrality parameter 4 given by 


n a 
a . 2.6.7 


ae) 


The proof of this result is beyond the scope of this volume and is not presented 
here. However, the interested reader is referred to Appendix E for a definition 
of the noncentral chi-square variable. 

Under Model II, we have from (2.5.20) that 


E(MSg) = 02 +noZ. (2.6.8) 


It can further be shown that 


MSz x*la— 1] 


2 + noz eae ; (2.6.9) 
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that is, the ratio of MS z to oa? + no is a x*[a — 1] variable divided bya-—l. 
The result (2.6.9) is true irrespective of whether o7 = 0 or not. A proof of this 
result can be found in Graybill (1961, pp. 344-345; 1976, pp. 609-610) and 
Kempthorne and Folks (1971, pp. 467-470). Furthermore, it can be proven that 
under both Models I and II, the two statistics MSz and MSy are statistically 
independent. In other words, the value of MSyw, whether particularly large or 
small, gives us no information about whether the value of MSz 1s particularly 
large or small. The mathematical proof that MS and MSy are independent is 
rather involved and is not presented here (see, e.g., Graybill (1961, pp. 345- 
346; 1976, pp. 609-610); Scheffé (1959, Chapter 2)). Intuitively, one can argue 
as follows. Since SSz is based solely on the group mean values, it has nothing 
to do with the individual variation within any group. Similarly, SSw is based 
solely on the individual variation within groups (i.e., measured from their 
respective group means) and is, therefore, not affected whatever the group 
means happen to be. 


2.7 TEST OF HYPOTHESIS: THE ANALYSIS 
OF VARIANCE F TEST 


In this section, we present the usual hypothesis about the treatment effects and 
the appropriate F test for fixed and random effects models. 


MODEL | (FIXED EFFECTS) 


Under Model I, the usual null hypothesis of interest is that all the treatments 
(factor levels) have the same effect; that 1s, 


Ho :@,; = =---=a, = 0. (2.7.1) 
The alternative is 
Hy, : not all @;’s are zero. 


In order to develop a test statistic for the hypothesis (2.7.1), we note from 
(2.5.10) and (2.5.14) that when Ap is true 


E(MSw) = 02 (2.7.2) 
and 
E(MSz) = 02; (2.7.3) 


that is, both the mean squares MSw and MSz are unbiased estimates of the 
same quantity ae. On the other hand, when (2.7.1) 1s false, 


E(MS3) > E(MSy). (2.7.4) 
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Furthermore, since under Hy both of these mean squares divided by oc? are 
independently distributed as chi-square variables divided by their respective 
degrees of freedom, it follows that their ratio is distributed as Snedecor’s F. 
Thus, the ratio 


_ MSg/o; _ MSz 


= ——__* = 2.7.5 
MS wy /o? MSy ( ) 


is distributed as an F variable with a — 1 and a(n — 1) degrees of freedom 
(see Appendix D). Hence, the null hypothesis (2.7.1) can be tested by computing 
the ratio (2.7.5) and comparing it directly with the one-tailed values in the tables 
of the F distribution with a — 1 and a(n — 1) degrees of freedom.? An a-level 
is chosen in advance and if the calculated value is greater than 100(1 — @)th 
percentage point of the F distribution with a— 1 and a(n—1) degrees of freedom, 
we may conclude that the hypothesis is false at the a-level of significance. 

The formal hypothesis testing procedure involving fixed @ as described is 
very useful and has led to many important developments in statistical theory. 
However, an alternative way to test the hypothesis (2.7.1) is in terms of the 
p-value. The p-value for a sample outcome is the probability of obtaining 
a result equal to or more extreme than the observed one. In this case, p = 
P([F[a —1, a(n — 1)] > Fo], where F[a — 1, a(n — 1)] has an F distribution 
with a — 1 and a(n — 1) degrees of freedom and Fp is the observed value of 
the statistic (2.7.5). Larger p-values support Hp and smaller p-values support 
H,. A fixed a-level test can be carried out by comparing the p-value with the 
specified a-level. If the p-value is greater than the specified a, Ho 1s concluded, 
otherwise not.’ For a further discussion of p-value, see Gibbons and Pratt (1975) 
and Pratt and Gibbons (1981, pp. 23-32). 

It should be observed that the F statistic defined in (2.7.5) always provides 
a one-tailed test of Ho in terms of the sampling distribution of F’. This is the 
case since under Hy, 


E(MSg) > E(MSw), 


and thus the F statistic must show a value greater than one. The value of an F 
statistic less than one can signify nothing except the sampling error,°? or perhaps 
nonrandomness of the sample observations, or violation of the assumptions. 


3 The F test considered here is designed for alternatives a; 4 aj’ for some pair (i, i’) in all possible 
directions. For a discussion of monotone alternatives a; < a2 <--- < dg, see Miller (1986, 
Section 3.1.3) and references cited therein. 

4 Hodges and Lehmann (1970, p. 317) suggest that one can consider the p-value as a “measure 
of the degree of surprise’’ which the experimental data should cause in the belief-of the null 
hypothesis. Miller (1986, p. 2) has termed the p-value a “measure of the credibility” of the 
null hypothesis. The smaller the value, the less likely one feels about the veracity of the null 
hypothesis. 

> When the null hypothesis is true, one can expect a value of the F ratio less than one at least 50 
percent of the time. 
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For example, if the measurements are not made in a random order, then any 
uncontrolled factor may have some varying effect on the sequence of experi- 
ments. This could cause an increase in within-group variance but may leave the 
between-group variance unaffected. Thus, if the value of the statistic (2.7.5) is 
obtained as significantly less than one, then it is possible that an important un- 
controlled factor has not been randomized during the course of the experiment 
and much of the usefulness of the experimental results has been invalidated. 


Remarks: (i) It should be noted that 


E(MSz) 
Be ge 


since, in general, the expected value of a ratio of two random variables is not equal to 
the ratio of the expected values of the random variables, even though the latter may be 
equal. Actually, it can be shown that when the null hypothesis is true 


E(F) = a(n — 1) 
~ a(n—1)—-2, 
Thus, under Ho, 
E(MS 
ESB 1, but E(F)> 1. 
E(MSy) 


(ii) The analysis of variance model (2.1.1) may also be looked upon as a way to 
explain the variation in the dependent variable. The ratio of the between-group sum 
of squares to the total sum of squares gives the proportion of the total sum of squares 
accounted for by the linear model being posited and provides a measure of how well the 
model fits the data. A very low value indicates that the model fails to explain a lot of 
variation in the dependent variable and one may want to look for additional factors that 
may help to account for a higher proportion of the variation in the dependent varible. 


MODEL I! (RANDOM EFFECTS) 


Under Model II if all the factor levels have the same effect in the population 
of random effects a;’s, then 0? = 0. Hence, the Model II analogue of the null 
hypothesis (2.7.1) is 

Ho :02 =0, (2.7.6) 
against the alternative 

H,:02 >0. (2.7.7) 
Again, from (2.5.10) and (2.5.20), it follows that when Hp 1s true, 


E(MSyw) =o? (2.7.8) 
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and 
E(MSz) = 07; (2.7.9) 


that is, both the mean squares MSz and MSy are unbiased for the error vari- 
ance o2. Furthermore, as stated in the preceding section, MS /(o; + noZ) is 
distributed as a chi-square variable divided by a — | and MSz is statistically 
independent of MSy. Thus, the ratio 


M 2 2 Ih = 
MSp/(oe + Noa) _ ( a ) bad: (2.7.10) 


l+n— 
MSw j o2 oa? MSy 
is distributed as an F variable with a — 1 and a(n — 1) degrees of freedom 
and, when /% is true, the statistic MS, /MSy has an F distribution. Hence, the 


same test statistic, as used in the case of Model I, can be employed to test the 
hypothesis (2.7.6). 


Remarks: (i) Sometimes, it is quite likely that the hypothesis (2.7.6) versus (2.7.7) may 
not be a realistic choice. For example, it may be thought that some differences between 
the factor levels (groups) are almost certain to exist, and then it makes little sense to 
test the hypothesis of no difference. However, the researcher may want to see whether 
o2 < o?/2; that is, if the variability between groups is half or less than the variability 


Q@ 


within groups. In other words, a researcher may want to test a hypothesis of the type: 
Hj:02 /o2 <p vs. Hi:0; /o; > po, (2.7.11) 


where p, is a specific value of o2/a?. In this case, the statistic (2.7.10) with 02/07 = p, 
provides the proper F test with large values of the statistic providing a ground for 
rejection. Thus, the test consists of rejecting Ho if Fos > (1 +np,) Fla — 1, a(n — 1); 
1 —a].° 

(ii) One is occasionally interested in testing the hypothesis that the overall mean (2) is 
equal to some given constant z,. The test can be performed by considering the quantity 
MS, = an(y.. — t,)*, which has one degree of freedom. Furthermore, it can be shown 
that 


a2 + an(ie — Mo)’, for Model I 


E(MS,) = ; 
ao? + an(u— po)? +no2Z, for Model II. 


Thus, under Model I, the hypothesis can be based on the ratio MS,/MSy which has 
an F distribution with 1 and a(n — 1) degrees of freedom. While, under Model II, the 
hypothesis can be based on the ratio MS,/MSz which has an F distribution with 1 and 
a — 1 degrees of freedom. It should be noticed that in this example Models I and II have 
led to different significance tests. 


© Spjotvoll (1967) has studied the structure of optimum tests of the hypothesis of the type (2.7.11). 
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TABLE 2.1 
Analysis of Variance for Model (2.1.1) 


Expected Mean Square 
Source of Degreesof Sumof Mean 9 ——————————— 


Variation Freedom Squares Square Model | Model II F Value 
a 
n> a? 
Between a—1 SSB MS, o2+—— — o¢+noz MSs/MSw 
a — 
Within a(n — 1) SSw MSw a2 a? 


Total an— 1 SSr 


2.8 ANALYSIS OF VARIANCE TABLE 


The results on the partition of the total sum of squares, degrees of freedom, expected 
mean squares, and the analysis of variance F test are usually summarized in the form 
of a table commonly referred to as the analysis of variance table. The table shows in a 
certain order the sums of squares and other related quantities used in the computation 
of the F test. Such a table greatly simplifies the arithmetic and algebraic details of the 
analysis, which tend to become rather complicated in more complex designs. 

In an analysis of variance table, the first column, designated as the source of variation, 
represents the partitioning of the total response variation into the various components 
included in the linear model. The second column, designated as degrees of freedom, 
partitions the sample size into various components that relate the amount of information 
corresponding to each source of variation of the model. The third column, designated 
as sum of squares, contains the sums of squares associated with various components 
or sources of variation of the model. The fourth column, designated as mean square, 
lists the respective sums of squares divided by the corresponding degrees of freedom. 
The fifth column, designated as expected mean square, contains expected mean squares 
that represent expected values of the mean squares derived under the assumption of an 
analysis of variance model. The sixth column, designated as F value, contains the values 
of the F ratios which are generally formed by taking ratios of two mean squares. In most 
of the worked examples presented in this volume, we have generally added a seventh 
column, designated as p-value, which contains probabilities of obtaining a result equal 
to or more extreme than the observed F ratios if the null hypothesis were true. 

Table 2.1 shows the general form of the analysis of variance table for the one-way 
classification model (2.1.1). 


2.9 POINT ESTIMATION: ESTIMATION OF TREATMENT 
EFFECTS AND VARIANCE COMPONENTS 


It should be recognized that performance of the F test and construction of 
the analysis of variance table by no means complete all the inferences the 
investigator may want to draw. The experimenter’s main objective is not always 
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to test the equality of all the treatment means. In many instances, he may want to 
estimate various parameters or functions of parameters. Under fixed effects, the 
parameters of the model (2.1.1) are w, a1, ..., @, and oa? which can be estimated 
from the sample data. The random effects version of the model (2.1.1) involves 
the parameters fz, 02, and o2 which also can be estimated. 

As estimators of the parameters, we consider the best linear and best quadratic 
unbiased estimators. Under Model I, we have 


E(yij) = w+ a. (2.9.1) 
Then it is straightforward to see that 


_ (etait + +m) 


E(¥.) 
n 
=pu+a; (2.9.2) 
and 
Ej.) = (Uta) Ft) +--+ U+Ag) +--+ (Uh + Og) 
V)= ee 
= fl; (2.9.3) 
since generally the a@;’s are chosen such that }°°_, a; = 0. From (2.9.2) and 


(2.9.3), the unbiased estimates of jz and a; are 

f= y.. (2.9.4) 
and 

Qi = Vi. —Y... (2.9.5) 


It can be shown that (2.9.4) and (2.9.5) are the so-called best linear unbiased 
estimates (BLUE) for and a@;, respectively. With the additional assumption 
of normality they are the best unbiased estimates. Furthermore, from (2.5.10), 
it follows that MSy is an unbiased estimate for a. In addition, it can be 
shown that MSy is the best quadratic unbiased estimate of o? and if the 
e;;'S are normally distributed, MSy is the best unbiased estimate (see Graybill 
(1954)). 

Under Model II, instead of estimating the effects directly by taking differ- 
ences of the treatment means from the grand mean as in the case of Model I, the 
problem of major interest is the estimation of the components of variance o2 
and a. One set of estimators of a2 and c? are immediately obtained using the 
standard method of moment estimation based on the expected mean squares 
appearing in the analysis of variance table. Thus, from Table 2.1, we obtain 
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that 
E(MSw) = 02 (2.9.6) 
and 
E(MSg) = o; +noz. (2.9.7) 
Hence, it follows that 
62 = MSw (2.9.8) 
and 
6? = (MSz — MSyw)/n (2.9.9) 


are unbiased estimators of a7 and a7, respectively. 


Remarks: (i) It can be shown that the estimators (2.9.8) and (2.9.9) are also the max- 
imum likelihood estimators’ (corrected for bias) of the corresponding parameters (see, 
e.g., Graybill (1961, pp. 338—344)). Furthermore, it can be proven that in the class of all 
quadratic unbiased estimators (quadratic functions of the observations), the estimators 
(2.9.8) and (2.9.9) have minimum variance (see Graybill and Hultquist (1961); Graybill 
(1976, pp. 614-615)). 

(ii) The aforesaid property of the “minimum variance quadratic unbiased estimation’ 
of the above estimators of variance components holds irrespective of the form of the dis- 
tribution of the random effects @;’s and e;;’s. Moreover, with the additional assumption 
of normality, the estimators (2.9.8) and (2.9.9) can be shown to have minimum variance 
within the class of all unbiased estimators (see Graybill (1954); Graybill and Wortham 
(1956)). Thus, the estimators (2.9.8) and (2.9.9) have certain optimal properties.’ Nev- 
ertheless, the estimate of of can be negative. 

(iii) It is clearly embarrassing to estimate a variance component as a negative number, 
which, by definition, is a nonnegative quantity. Several courses of action, however, 
are available. One procedure is to accept the negative estimate as an indication that 
the true population value of the variance component is close to zero, assuming of course 
that the sampling variability produced a negative estimate. This seems to have some 
intuitive appeal but the replacement of a negative estimate by zero affects some statistical 
properties such as unbiasedness of the estimates. Alternatively, one may reestimate the 
variance components using methods of estimation that always produce nonnegative 
estimates. Still, another alternative is to interpret the negative estimate as an indication 
that the assumed model is wrong. A full discussion of these results is beyond the scope of 


’ For the nonnegative maximum likelihood estimators, see Sahai and Thompson (1973). 
8 There are biased estimators that have more desirable properties in terms of the mean squared 
error criterion. 
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this volume. The interested reader is referred to the survey papers by Harville (1969, 
1977), Searle (1971a), and Khuri and Sahai (1985) and books by Searle (1971b) and 
Searle et al. (1992) which contain ample discussions of this and other related topics. 

Knowing the estimates of a2 and Ge. the estimate of the total variance oy 1S 
obtained as 


6° = 674 6%. (2.9.10) 
The fact that the total variance consists of 02 and o2 permits one to make a 
somewhat more informative use of the estimates of 07 and a7. We can take the 
ratio of the estimated o; to the estimated total variance (6*) to find the estimated 
proportion of variance accounted for by the factor levels. It is highly informative 
to estimate variance components individually and to employ them in evaluating 
proportions of variance accounted for by different factors. Such proportions give 
one of the best ways to decide if a factor is a predictably important one. For 
example, it is entirely possible for a given factor to give statistically significant 
results in a study, even though only a very small percentage of variance is 
attributable to that factor. This is most likely to happen, of course, if the sample 
nis very large. On the other hand, when there is significant evidence for effects 
of a factor and the factor also accounts for a relatively large percentage of 
variance, then this information may be an important instrument in interpreting 
the experimental results or in deciding how the experimental findings might be 
applied. Thus, in Model II experiments, when the levels of a factor are sampled, 
it is a good practice to estimate the components of variance, and to judge the 
significance of the factor on the basis of the explained variation in addition to 
the results of the F test. 

In many areas of research, particularly in genetics and industrial work, in- 
terest may center on the estimation of the variance ratio? 9 = o2/o7, or the 
intraclass correlation defined by p = 02/(o7 + a2) (see Appendix M for a defi- 
nition of the intraclass correlation). It can be shown that the uniformly minimum 
variance unbiased (UMVU) estimator of 0 is given by (see, e.g., Graybill (1961, 
pp. 378-379); Winer et al. (1991, p. 97)). 


MSy a(n — 1) 
[a(n — 1) — 2] MSp — a(n — 1)MSy 


P | a(n—1)—2 | 


n 


an(n — 1)MSywy 
MSs — mMS 
oe (2.9.11) 
mnMSy 


° The ratio9 = o2/o2 measures the size of the population variability relative to the error variability 
present in the data. 
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where 


By way of contrast 


mo, 

MS; — MS 
at prea Jee (2.9.12) 
6? nMSy 


To get an idea of the order of the magnitude of the bias in the estimator 
(2.9.12), consider the following data (Winer et al. 1991, pp. 97-98): 


a=4. .n-= 10, 
MS, = 250, MSy = S50. 


For this example, 
A 2 “~ 2 I 
6,=50 and 6,= Tia — 50) = 20. 


Hence, 


62 20 
Oe 


50 


Whereas, from (2.9.11), we have 


36 
_ 250 — —(50) 
6 = —__ 34 9 377. 


360 
( - oo 
Thus, the estimator (2.9.12) is slightly positively biased in relation to the esti- 
mator (2.9.11). 
The UMVU estimator of the intraclass correlation cannot be expressed in 
closed form (Olkin and Pratt 1958). A computer program to calculate the 
UMVU estimator is given by Donoghue and Collins (1990). A biased esti- 


mator, however, can be obtained by substituting the estimates of the individual 
components in the formula for the intraclass correlation. Thus, 


6? MS3 — MSy 


ae (2.9.13) 
62462  MSg+(n—1)MSy 
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The estimator (2.9.13), however, can produce a negative estimate. Alternatively, 
an estimate of the intraclass correlation in terms of the unbiased estimate of 0 is 


6 
ey 2.9.14 
p Re: ( ) 


For the numerical example in the preceding paragraph, the estimator (2.9.13) 
gives 


20 
50 + 20 


re = 0.286, 


whereas the alternative estimator (2.9.14) in terms of the estimator of @ yields 


0.372 
~~ 1 +0.372 


Al 


p = i027 |. 


Hence, the latter estimator (6’) slightly underestimates p in relation to the 
estimate provided by #. For a review of other estimators of p, see Donner 
(1986). 


2.10 CONFIDENCE INTERVALS 
FOR VARIANCE COMPONENTS 


An examination of the mean square and the expected mean square columns 
of the analysis of variance Table 2.1 suggests which quantities can be readily 
estimated by aconfidence interval. First, note that each of the entries of the mean 
square column can be transformed into a chi-square variable by multiplying it by 
the corresponding degrees of freedom and then dividing by the corresponding 
expected mean square value. This chi-square variable can then be used to obtain 
a confidence interval for the quantity appearing in the expected mean squares 
column. Thus, a 100(1 — @) percent confidence interval for a2 can be obtained 
by noting that 


oe ~ x?[a(n — 1)]. (2.10.1) 


e 


The desired confidence interval is then given by 


a(n — 1)MSy a) a(n = 1)MSy 


<r << — — _| = 1-2, (2.10.2) 
X7Ja(n—1),1-a@/2] * — x7[a(n — el 


where y7[a(n — 1), 1 —a@/2]and x?[a(n — 1), w/2] denote the 100(1 — a/2)th 
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and 100(@/2)th percentage points of the chi-square distribution with a(n — 1) 
degrees of freedom.'° Similarly, a 100(1 — @) percent confidence interval for 
o? +no2 can be obtained by noting that 


(a — 1)MSpz 


ae x7[a — 1]. (2.10.3) 


The desired confidence interval is then given by 


— 1)MSz, ~ 
<o) +no2 < S ) MSs |=1-4 (2.10.4) 


(a — 1)MSp 
‘i x7[a —1, a/2] 


x72[a — 1,1 —a@/2] 


where again x*[a—1, 1—a@/2] and x*[a —1, a /2] represent the 100(1 — 
a/2)th and 100(@/2)th percentage points of the chi-square distribution with 
(a — 1) degrees of freedom. 

Unfortunately, the expressions in the expected mean square column are the 
only quantities for which one can obtain exact confidence intervals by the pro- 
cedure just described. In particular, an exact confidence interval for o2 does not 
exist. Various approximate confidence intervals have been proposed in the liter- 
ature. For a detailed discussion of these procedures and their relative merits, the 
reader is referred to Boardman (1974) and Burdick and Graybill (1988, 1992, 
pp. 60-63). Here, we briefly describe some procedures that have been recom- 
mended for the problem. A conservative 100(1 — 2a) percent confidence inter- 
val based on the distributions of MSz and MS, /MSy 1s obtained as (see, e.g., 
Williams (1962); Graybill (1976, pp. 618—620)): 


P| (a — 1)MSp \ Fla — 1, a(n — 1); aT) 
nx*[a—1,1—a/2] ( 7 Fe 
; (a — 1)MSz (1 7 Fla —1, a(n — 1); ey | Siz oy 


<0 


«= nx*[a — 1,a/2] | da 


(2.10.5) 


where F* =MSz/MSy. The empirical evidence seems to indicate that the 
probability 1 — 2@ in (2.10.5) can be replaced by 1 — a. Similarly, two approx- 
imate 100(1 — @) percent confidence intervals based on the distribution of the 
ratio of mean squares are obtained as (see, e.g., Bulmer (1957); Scheffé (1959, 


10 A slightly shorter confidence interval could be obtained by considering unequal probabilities in 
each tail. Tate and Klett (1959) and Murdock and Williford (1977) provide tables of chi-square 
values that provide shortest two-sided intervals. 
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p. 235); Searle (1971b, p. 414))."!: 


MS MS 
Pa ee era aoe ta) 
nF[a—1,0;1—a/2] \MSy 
MS MS 
ds fe —~* _ Fla—1,a(n — 1); o/2] =l-a. 
nF[a—1, co; a/2] \MSy 


(2.10.6) 


and 
: MSw(F* — Fo)(F* + Fo — F5) 
nF* F, 
- MSyw(F* — F\)(F* + Fi — Fi) 


<o =1l-a. (2.10.7) 
nk* F, 


where F* =MS2/MSy, F; = Fla—1, a(n—1);a@/2], Fo = Fla—1, a(n—1); 
1—a@/2), Fj = Fla—1, «;a@/2], and F, = F[a — 1, 00, 1 —a@/2]. 

Finally, an approximate procedure that seems to provide a shorter interval 
and has better coverage property is given by (Ting et al. (1990); Burdick and 
Graybill (1992, pp. 60-61)): 


Pp} (MS. —~MSy — JV) <o2 < ~ (MS —MSy + Vo) }=1 — a, 
(2.10.8) 


where 


V, = G?MS%, + H2MS%, + Gi2MSeMSy, 
Vy = H?7MS?, + G3MSi, + Hi2MSeMSy, 


with 


G,=1-—F'[a-—1,0;1-—a/2], G.=1-F '[a(n—1), 00;1-—a/2], 
H, = F~'[a-—1,o;a/2]—1, Ay = F7'[a(n — 1), 0; a/2]—1, 


_ (Fla — 1, a — 131 ~ @/2) — 1) — Gi Fla — 1, a(n — 131 — @/2) — Hy 


Gi2 
Fla — 1,a(n — 1);1—-—a@/2] 


b 


'! A confidence interval for o2 that is robust to departures from normality can be obtained using 
the jackknife technique (Arvesen and Schmitz (1970); Miller (1986, Section 3.6.3)). 
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and 


es (1 — Fla — 1, a(n — 1);@/2])* — H?F?[a — 1, a(n — 1);a/2] — G3 
nn Fla — 1, a(n — 1);a/2] 
Although we have only approximate procedures for confidence limits for a7, 
we are able to obtain exact confidence limits for the ratio o2/a7, the intraclass 
correlation o2/(a7 + a2), and o2/(a2 + a2). Using the distribution result in 
(2.7.10); that is, 


ieee ~MSp Fa noe (2.10.9) 
n—& ~ Fla — 1, a(n — 1)], 10. 
a? MSy 


it can be shown that the probability is 1 — @ that 


1 {| MS, | 1 
n\{MSy Fl[a-—1,a(n —1); 1—a/2] 
a? 1 fs 1 


a et if (2.10.10) 


o2 n\MSw Fla—1,a(n—1); @/2] 


Furthermore, on rearranging the inequalities in (2.10.10), it readily follows that 
with probability 1 — a, the following relations hold: 


F* — F[a—1,a(n — 1); 1 —a@/2] 
F* +(n—1)Fl[a — 1, a(n — 1); 1 —a@/2] 
a? F* — Fla — 1, a(n — 1); @/2] 


eS. (2.10.11) 
o-+o2 F*+(n—1)F[a—1,a(n — 1); @/2] 
and 
nF[a — 1, a(n — 1); a/2] 
F*+(n—1)F[a — 1,a(n — 1); a/2] 
2 a 2s 
0; nF[a —1,a(n — 1);1 —a@/2] (2.10.12) 


<< 7 
o2-+o2 F*r+(n—1)Fla—1,a(n—1); 1 -—a@/2] 


where F* = MSz/MSy. The inequalities (2.10.11) and (2.10.12) provide exact 
confidence limits for the intraclass correlation (p) and 1 — p, respectively. Since 
p = O, negative limits are defined to be zero. 


Remark: Singhal (1987) and Groggel et al. (1988) provide methods for determining 
approximate confidence intervals for o under the assumption of nonnormality. 
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2.11 COMPUTATIONAL FORMULAE AND PROCEDURE 


For performing analysis of variance, packaged computer programs are widely 
available for handling calculations that would have been highly tedious or 
simply not feasible in the precomputer age. It is assumed that computer soft- 
ware is used in the handling of analysis of variance computations for all but the 
simplest data sets. For hand calculations, however, the definitional formulae 
for SS7, SSzg, and SSwy given in Section 2.3 are usually not very convenient. In 
the following we give useful computational formulae, which are algebraically 
identical to the definitional formulae. Thus, 


Total Sum of Squares (SS;) = 2 ys On= y.)° 


i=1 j=1 
2 


=> - =, (2.11.1) 
i=1 j=l 


Between Group Sum of Squares (SSz) = n Yi. ~j¥.) 


i=l 


1 a 2 
yy yee. (2.11.2) 
ar an 
and 
Within Group Sum of Squares (SSw) = > (Vij — yi)? 
i=l j=l 


=)°y' y- woes (2.11.3) 


i=1 j=l 


It should be noted that the within-group sum of squares (SSy) can also be 
computed by making use of the identity (2.3.7), giving 


SSw = SS7 — SSz. (2.11.4) 


The relation (2.11.4) can further be used to check the validity of the earlier 
computations. 

Now, the steps in the analysis of variance computations can be summarized 
as follows: 


(1) Sum the observations for each level to form y; for all i, and then obtain 
the grand total y.. 
(ii) Form the sum of the squares of the individual observations to yield 


ae i Vis 
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(111) Form the sum of squares of the totals for each level and divide it by n 
to yield )_, yj /n 
(iv) Square the grand total and divide it by an to yield a y*/an term, which 
is known as the correction factor. 
(v) The three sums of squares can now be obtained by using the computa- 
tional formulae (2.11.1) through (2.11.3). 


Remark: The computational formulae given in this section are very convenient to use. 
But a word of caution must be included for individuals who use a computer or an 
electronic calculator with an eight digit capacity. If we sum the squares of numbers 
containing three or more digits, we can exceed their capacity easily and thereby get 
erroneous results. In that case it would be better to use their definitional formulae to 
calculate the appropriate sums of squares. 


2.12 ANALYSIS OF VARIANCE FOR UNEQUAL 
NUMBERS OF OBSERVATIONS 


Equal numbers of observations for each treatment or at each factor level are 
desirable because of the simplicity of organizing the experiment and subsequent 
data analysis. Furthermore, for a given sample size, the analysis of variance pro- 
cedure is most powerful; that is, 1t provides the smallest value of the probability 
for committing a type II error, when the number of observations for each level 
of a factor is the same. In addition, it has been found that the F test is relatively 
insensitive to the violation of the assumption of the homogeneity of variances 
when the samples are of equal size. However, due to a variety of reasons, it 
may happen that it is impossible to collect an equal number of observations at 
each level of the factor. Part of the data may have been lost, or certain treatment 
or factor levels, which are important for some other reasons, may have been 
emphasized by taking more observations at these levels. Thus, if more data are 
available at some levels than at others, we must take them all into the analysis. 

The analysis of variance for the one-way classification model with unequal 
numbers of observations is essentially the same as for the balanced case. One 
needs to make only minor changes in the formulae to account for unequal 
sample sizes. To avoid unnecessary repetition, we simply indicate the necessary 
changes in notation and present a summary table of the analysis of variance. 
Thus, suppose that the factor has a different levels and thatn; (i = 1,2,..., a) 
observations have been made at each level, giving a total of N = )-"_j.n; 
observations in all. The analysis of variance models and their assumptions 
remain the same except that under Model I in place of )“"_, a; = 0, we now 
need the restriction that ae n;a; = 0. Also, remember that the class mean y;,. 
is now based on n; observations. With this basic change, an identical analysis 
can be carried out as in the balanced case without any conceptual difficulty. The 
details of the analysis are summarized in the form of an analysis of variance table 
as given in Table 2.2. The derivation of expected between mean squares under 
Models I and II is somewhat involved and can be found in Graybill (1961, pp. 
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TABLE 2.2 


Analysis of Variance for Model (2.1.1) with Unequal Sample Sizes 
Expected Mean Square” 

Source of Degreesof Sumof! Mean ————————— 

Variation Freedom Squares Square Model | Model Il F Value 


a 
nia? 
Between a—1 SSp. 2 MSg” oo 2 + = — 92 +noog MSp/MSw 
a — 
Within N-a SSw MSw o2 oa? 


Total N-1 SST 


| The sums of squares in this case are defined as follows: 


a a nj a nj 
SS3 = So ni Fi. —jy.), SSw= > Si — j¥,)*, and SS;= 3 Sis _ 5), 
i=] i=l g=1 i=l j=l 
with 
nj a nj a 
Jij Yij So nidi 
j=l d 3 i=l] j=l i=l 
— an oe 
f nj a N N 


2 ny = (N? — )~4_, n?)/N(a — 1). If the number of observations is the same for each group, 
that is, n) = n> = --- = 7g =N, then it follows that ny = (a*n? — an*)/an(a — 1) =n. 
Thus, the results of the analysis of variance for the unbalanced design reduce to that for the 
balanced case. 


351-354; 1976, pp. 517-518). The analysis of variance F test and estimation 
of variance components can be accomplished as before. For example, unbiased 
estimators of oa? and a2 are given by 


6° = MSy (2.12.1) 
and 
6° = (MSz — MSy)/no, (2.12.2) 


where n, is defined following Table 2.2. The estimator (2.12.1) is still the 
minimum variance unbiased for chee However, the estimator (2.12.2) 1s not the 
best estimator of a? especially when the n;’s differ greatly among them.” 

It should be remarked that in the case of the unbalanced design, the between- 
mean square (MSz) does not have a chi-distribution when a. > O, but instead 
a weighted combination of chi-square distributions. Under the null hypothesis 


12 For some further discussions on this point, see Robertson (1962) and Kendall et al. (1983, Section 
36.26). For some alternative estimators of variance components for an unbalanced design see 
Searle (1971b, Chapter 10). 
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Hy :o2 = 0, the statistic MSz /MSy has an F distribution with a — 1 and N —a 
degrees of freedom, and can be used to test the corresponding null hypothesis; 
but its distribution under the alternative (a2 > Q) is much more complicated 
than the corresponding balanced case. For some additional results on this topic 
see Singh (1987) and Donner and Koval (1989). Similarly, the normal theory 
confidence interval for 0? can be obtained in the usual way but the determination 
of confidence intervals for 02, 02/02, and 02 /(o27 + 02) is much more com- 
plicated. The interested reader is referred to the book by Burdick and Graybill 
(1992, pp. 68-77) for a concise discussion of methods of constructing confi- 
dence intervals in the case of unbalanced design.!? In passing, we may note that 
an exact 100(1 —a@) percent confidence interval for ¢2 and approximate 100(1 — 
2 490 


a@) percent confidence intervals for 07, 0, /o?, and the intraclass correlation 


p= oa? / (a2 + 02), based on the distribution of the mean squares, are given by 


N —a)MS N —a)MS 
P og a < ee l—a, (2.12.3) 
x7[N — a, ] —a/2] x7LN — a, a /2] 

LMS*, ) UMS, 
a 
(1+n*L) F[a — 1, 00;1 —a@/2] “— (1+n*U) Fla — 1, 00; 0/2] 

ar. (2.12.4) 
MS*, 
n*MSwFla—1,N —a;1—a/2] nmin 
% MS* ] 
a eS a, 00995) 
o2 = =n*MSwFl[a—1, N—a;a/2] max 


and 


F*/F[a—-—1,N —a;1—a/2]-1 
F*/Fla —1,N —a;1—a/2]+(n, — 1) 
F*/Fla—1, N—a;a/2]-1 
iP Sy rea eee ara earl a 


< 
— o2+o02 — F*/Fla—1,N —a; a@/2]+ (n, — 1) 


(2.12.6) 


where 


N= ) ne 
i=l a 


* 
= 
a 1/n; 
i=] 


he No 
N(a—1) 


13 Methods for constructing confidence intervals for 02, 02/07, 02 +62, and a7 /(o7 +2) have 
been discussed by Thomas and Hultquist (1978), Burdick and Graybill (1984), Burdick and 
Eickman (1986), Burdick et al. (1986b), and Donner and Wells (1986). 
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TABLE 2.3 
Data on Yields of Four Different Varieties 
of Wheat (in bushels per acre) 


Varieties 
| il Hl IV 
96 93 60 76 
37 81 54 89 
58 719 78 88 
69 101 56 84 
73 96 61 75 
81 102 69 68 
a — 2 
F* MSp Ms* yy (ye — y*) 
’ B oO 2 
MSyw = oo 
* 
UMS 
n*MSy Fla—1,N —a;1—a/2] — mmin’ 
* 
~ OMSp 
n*MSw Fla > 1,N —a;a/2] nae 
Amin = min(n;, nN, ED | Ng); Nmax = max(n, no, i Ng), 


with 
nj a 
y=) yij/n; and y* = ya 
jal i=] 


2.13 WORKED EXAMPLES FOR MODEL I 


Suppose a crop scientist wishes to test the effect of four varieties of wheat on 
the resultant yield. She designs an experiment with 24 plots of the same size 
and shape and sows each variety at random in 6 of the 24 plots. The yields from 
these 24 plots provide the data for a one-way classification with equal sample 
sizes and are presented in Table 2.3. 

The data from the experiment just described must be analyzed under Model I 
since the four varieties are specially selected by the experimenter to be of 
particular interest to her. Hence, the factor under investigation (varieties of 
wheat) will have a fixed effect. In this example, a = 4, n = 6, and the resultant 
calculations for the sums of squares, using the computational formulae given 
in Section 2.11, are summarized in the following. 
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The marginal totals corresponding to the different varieties are 
yi, =414, yo, =552, 3, = 378, ya, = 480; 
and the grand total is 
y,, = 1,824. 
The other quantities needed in the calculations of the sums of squares are: 


y> _ (1,824)? 


an 


= 138,624, 


[eee 1 
-)\ oy = a1(414)" + (552)? + (378)* + (480)"] = 141,564, 
ut 1=1 
and 
a Ht 2 eos 2 9) 9) 2 = 
S> S2 y2 = 06) + 37)? +--+ (75) + (68)? = 144,836. 
i=1 j=l 


The resultant sums of squares are, therefore, obtained as follows. 


SS; = 3 ae yz, — = = 144,836 — 138,624 
— 6, aA 

SSp = so 2 _ == = 141,564 — 138,624 
a en 


and 


SSy = yy Yi-n 2De: y? = 144,836 — 141,564 


rl j= 
= 35272. 


Finally, the results of the analysis of variance calculations are summarized 
in Table 2.4. If we choose the significance level of a = 0.05, we find from 
Appendix Table V that the 95th percentage point of the F distribution with 3 and 
20 degrees of freedom is 3.10. Since the value of the F statistic from Table 2.4 is 
5.99, which is greater than 3.10 (p = 0.004), we may conclude that the effects 
of the four varieties of wheat are significantly different. Stated another way, the 
response variability attributable to the means of varieties is significantly greater 
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TABLE 2.4 
Analysis of Variance for the Yields Data of Table 2.3 
Source of Degreesof Sumof Mean Expected 
Variation Freedom Squares Square Mean Square F Value _p-Value 
6 4 
Between 3 2,940 980.000 «7 + —— yap 5.99 0.004 
Varieties = 
Within 20 3,272 163.600 o? 
Varieties 
Total 23 6,212 
TABLE 2.5 


Data on Blood Analysis of Animals Injected 
With Five Drugs 


Drugs 
A B Cc D E 
19.0 7.0 4.0 6.0 6.0 
11.0 1.0 7.0 6.0 4.0 
15.0 4.0 7.0 6.0 2.0 
4.0 10.0 


than the variability due to uncontrolled experimental error. The conclusion of 
the F test from Table 2.4 may not have surprised the agricultural investigator. In 
the first place, she conducted the study because she expected the four varieties 
of wheat to have different effects on yield and was interested in finding which 
varieties lead to higher yield. We discuss this problem, namely, how to study 
the nature of the factor level effects when differences exist, in Section 2.19. 

For another example, involving unequal sample sizes, suppose a pharma- 
ceutical research company conducts an experiment to compare efficacy of five 
drugs. There are 20 animals available for the trial and each drug is injected into 
4 randomly selected animals. Three animals die during the course of the experi- 
ment. The blood samples from the remaining animals are taken and analyzed. 
The data on blood pH reading from each blood analysis in certain standardized 
units are presented in Table 2.5. 

The data of Table 2.5 should be analyzed again under Model I since the five 
drugs are specially chosen by the company to be of particular interest. Hence, 
the factor under investigation (drugs) will have a systematic effect. In this 
example, a = 5,n; = 3,n2. = 4,n3 =3,n4 = 4, andns = 3; and the resultant 
calculations for the sums of squares are summarized in the following. 
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The marginal totals corresponding to the five different drugs are 
y;, = 45.0, yo, = 16.0, y3 = 18.0, ys, = 28.0, ys, = 12.0; 
and the grand total is 
y, = 119.0. 


The other quantities needed in the calculations of the sums of squares are 
obtained as 


2 119) 
y. UPN _ 933 000, 


4 2 2 2 9) 2 12 2 
_ 4)", C6) ze (18) (28) a (12)" 1.091.000, 
nj 3 4 3 4 3 


and 
a ni 
2 2 2 Pia. 
>> >) y2 = (197 + (11)? +++ + (2)? = 1,167.000. 
i=1 j=l 
The resulting sums of squares are, therefore, given as 


a nj 2 
= a ae 
SS; = ) ) , Vij ~ = 1,167.000 — 833.000 
i=l] j= 


= 334.000, 
a 2 ye 
SSz = — — Wy = 1,091.000 — 833.000 
t=] 
= 258.000, 


and 


a nj a 2 
2 Oe, 
SSw = s ) ) va - = 1,167.000 — 1,091.000 
(= jJ= 


(=i? 


= 76.000. 


Finally, the results of the analysis of variance calculations are summarized 
in Table 2.6. If we choose the level of significance a = 0.05, we find from 
Appendix Table V that the 95th percentage point of the F distribution with 4 
and 12 degrees of freedom is 3.26(p < 0.001). Since the value of the F statistic 
from Table 2.6 is 10.18, which is greater than 3.26, we may conclude that the 
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TABLE 2.6 
Analysis of Variance for the Blood Analysis Data of Table 2.5 
Source of Degreesof Sumof Mean Expected 
Variation Freedom Squares Square Mean Square F Value _p-Value 
1 5 
Between 4 258.000 64.500 of + —— Yoni? §=:10.18 ~~ <0.001 
Drugs i 
Within 12 76.000 6.333 a? 
Drugs 
Total 16 334.000 
TABLE 2.7 
Interview Ratings by Five Staff Members 
Staff Members 
l il Hl IV V 
86 67 57 83 85 
75 86 74 80 84 
94 90 7] 96 92 


86 76 55 99 91 


different drugs do not lead to the same mean response; that is, there 1s a relation 
between the injected drug and the pH reading from the blood analysis. 

This conclusion may not have surprised the drug company. In the first place, 
it conducted the study because it was suspected that five drugs would have 
different reactions and the company was interested in finding out the nature of 
these differences. In Section 2.19, we discuss the second stage of the analysis, 
namely, how to study the nature of the factor level effects when differences exist. 


2.14 WORKED EXAMPLES FOR MODEL II 


Suppose a college admissions office wishes to study the results of the interview 
ratings of prospective students by its staff members. Five staff members are 
selected at random and four prospective students are assigned randomly to each. 
The results provide the data for a one-way classification with equal sample sizes 
and are given in Table 2.7. 

The data of Table 2.7 should be analyzed under Model II since the five staff 
members are randomly selected from the list of college staff and the results 
of the analysis are to be valid for the entire pool of college staff. Hence, the 
factor under study should be regarded as having random effects. Here, a = 5 
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and n = 4 and the resultant calculations for the sums of squares using the 
computational formulae are given in the following. 
The marginal totals for ratings corresponding to the five staff members are 


ype 34le “ye ee S19. yg 257, ge Sy. Vs = 02; 
and the grand total 1s 


y. = 1,627. 


The other quantities needed in the calculations of the sums of squares are 
obtained as 


y? a 627) 


an 


= 132,356.450, 


I, 1 2 2 2 2 2 
— ) > 9}, = F1G41? + G19)? + 257)? + (358)? + B52)"] = 134,039.750, 
pet 

and 


»y y2, = (86)? + (75) +++» + (92)? + (91)? = 135,137. 


i=] J= 


The resultant sums of squares are, therefore, given as 


SSp = yy y?, — == = 135,137 — 132,356.450 
i=1 j= 
= 2,780.550, 
SSz = pie 2 _ == = 134,039.750 — 132,356.450 
as 1.683.300. 


and 


SSy = yyy = 7 ye = 135,137 — 134,039.750 
i=l] j=1 


= 1097.250. 


Finally, the results of the analysis of variance calculations are summarized 
in Table 2.8. Here, the null hypothesis states that the variability in interview 
rating among staff members is due entirely to the natural variability among 
students, whereas the alternative hypothesis states that there 1s an additional 
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TABLE 2.8 

Analysis of Variance for the Interview Ratings Data of Table 2.7 

Source of Degrees of Sum of Mean Expected 

Variation Freedom Squares Square Mean Square FValue _ p-Value 

Between 4 1,683.300 420.825 a2 +4o0? 5.753 0.005 
Staff 

Within 15 1,097.250 73.150 oa? 
Staff 

Total 19 2,780.550 


variability among staff members due primarily to differences in staff members’ 
rating schemes. If we choose the level of significance w = 0.05, we find from 
Appendix Table V that the 95th percentage point of the F distribution with 4 
and 15 degrees of freedom is 3.06. Since the computed F value of 5.753 from 
Table 2.8 is greater than 3.06 (p = 0.005), we may conclude that o7 > 0, or 
that the mean ratings of the staff members differ significantly. 

Furthermore, if the experimenter is interested in estimating the magnitudes 
of the components of variance o7 and o2, we may obtain their unbiased esti- 
mates using the formulae (2.9.8) and (2.9.9). Hence, from (2.9.8) and (2.9.9), 
we find that 


6? = 73.150 
and 


-» 420.825 — 73.150 
ee 


: = 86.919. 
4 


The estimate of the total variance ao? is then given by 


6? = 62+ 62 = 73.150 + 86.919 
= 160.069, 
and the estimated proportion of the total variance accounted for by the staff 
members 1s 


86.919 
~ 160.069 


lo) 


0.543. 


ie) 


2 
a 
2 
y 


Thus, we observe that about 54 percent of the variance among interview rat- 
ings seems to be due to differences among staff members. This would be a most 
important finding in such an experiment, as it would suggest that repetitions 
of this experiment involving different staff members would not be comparable. 
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A change in experimental procedure or some better control over staff ratings 
would clearly be advisable. 
To obtain a 95 percent confidence interval for ak we have 


MS = 73.150, x2[15, 0.025] = 6.262, and y2[15, 0.975] = 27.488. 


Substituting these values in (2.10.2), the desired 95 percent confidence interval 
for o2 is given by 


1 | 7 
5x 73.150) _ 15 x 73.150] _ 9 9 
27.488 : 6.262 


Or 
P[39.917 < of < 175.224] = 0.95. 


Similarly, to obtain a 95 percent conservative confidence interval for o2 from 
(2.10.5), we have 


x°[4, 0.025] = 0.484, y7[4, 0.975] = 11.143, 
F[4, 15; 0.025] =0.116, and F[4, 15; 0.975] = 3.804. 


Substituting appropriate values in (2.10.5), the desired 95 percent confidence 
interval for o2 is given by 


4 x 420.825 3.804 > 4x 420.825 0.116 
P| ————_ [ 1 - ——— ] < oF < — ——[ 1 - —— ]] = 0.95 
4x 11.143 5.753 ° 4 x 0.484 5.753 


Or 
P[12.794 < of < 851.942] > 0.95. 


Furthermore, to obtain a 95 percent approximate confidence interval for a2 
from (2.10.6) and (2.10.7), we have 


F[4, co; 0.025] =0.121, and F[4, co; 0.975] = 2.790. 


Substituting appropriate values in (2.10.6), the desired 95 percent confidence 
interval for a2 is given by 


73.150 


73.150 
< I 
4x 0.121 


A (5,753 = 3,804 
ax2700° > et 


(5.753 — 0.1 16) = 0.95 


Or 


P[12.775 <o% < 851.956] =0.95. 
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Similarly, substituting the appropriate values in (2.10.7), the desired 95 percent 
confidence interval for a2 is given by 


p 73.150(5.753 — 3.804)(5.753 + 3.804 — 2.790) 
4 x 5.753 x 2.790 


» _ 73.150(5.753 — 0.116)(5.753 + 0.116 — 0.121) 
On <-—_——-, eee  —-?o—r20000 ODD0wWww 


= 0.95 
4 x 5.753 x 0.121 


Or 
P{15.027 < of < 851.215} =0.95. 


Likewise, to obtain a 95 percent confidence interval for a2 from (2.10.8), we 
compute the following quantities: 
G, = 0.6416, G>) = 0.4536, A; = 7.2645, A> = 1.3981, 
Gi. = —0.1289, Ay. = —1.1587, 
V, = 79,392.0996 and Vy = 9,311,190.0700. 


Substituting appropriate values in (2.10.8), we obtain 
P{16.477 < 02 < 849.775} =0.95. 


Note that this interval results in a slightly shorter interval than the intervals for 
o2 reported previously. 

Finally, substituting the appropriate values in (2.10.10) through (2.10.12), 
the desired 95 percent confidence intervals for 02/02, 02/(a2 + 02), and o2/ 
(oc? + o2) are given by 


7545 £7345 
pie 2 ani ne Oe ft Gl | eos 
4\ 3.804 o2 ~ 4\0.116 


Or 
o2 
p|0.128 <— < 12.149] — (0.95, 
O-; 

5.753 — 3.804 a2 5.753 — 0.116 | 
a ee 5 
5.753 + (4—1)3.804 o2+02 ~ 5.753 +(4—1)0.116 

Or 


2 
P|0.114 < —*— <0.924| =0.95, 
oz-+o2 


é a 
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TABLE 2.9 
Data on Yields of Six Varieties of Corn (in bushels per acre) 


Varieties 


Four Country Silver King lodent Lancaster Osterland = Clark 


7.3 bel 6.9 9.6 4.8 4.3 
4.5 5.4 6.8 7.8 92 8.4 
74 5.2 7.6 9.6 8.5 6.6 
7.4 4.0 8.1 7.7 8.8 4.9 
5.0 9.4 8.2 7.9 5.8 
5.9 12.0 7.3 5.9 7.6 
6.4 15.9 11.3 9.2 3.7 
6.3 7.4 9.5 

5.0 9.0 8.8 

6.1 5.2 8.4 

7.9 9.2 - 6.8 

5.7 8.6 


Source: Snedecor (1934). Used with permission. 


and 
4x 0.116 a? 4 x 3.804 
a es GE ee EOS 
5.753 + (4 — 1)0.116 a2 + ose 5.753 + (4 — 1)3.804 
or 


2 
P| 0.076 < —-£— < 0.887] —0.95. 
o2 + o2 


é a 


For another example involving unequal sample sizes, consider the data from 
an experiment reported by Snedecor (1934) who compared the yields of a 
number of varieties of corn, each variety being represented by several inbred 
lines. The data on yields (in bushels per acre) for six varieties of their inbred 
lines are given in Table 2.9. 

The data in Table 2.9 should again be analyzed under Model II since each 
variety of corn is being represented by several inbred lines and the results of 
the analysis are to be applicable for all the varieties. Here, a = 6, n; = 12, 
ny = ans = 12g Sng = 7, 16 = 1, NS ny = 53, ig = 
(N? — ey n?) /N(a— 1) = 8.626; and the resultant calculations for the sums 
of squares are given in the following. 

The marginal totals for yields corresponding to six varieties of corn are 


yy = 74.9, y2, = 223; y= 106.1, 0 95.0, y5, = 54.3, v6. = 41.3; 
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and the grand total is 


y, = 393.9. 


The other quantities needed in the calculations of the sums of squares are ob- 
tained as 


2 2 
¥. _ G93." _ 5 997.495, 
N 3 
fy? (74.9)? = (22.3) 106.1)2 (95.0)? (54.3) (41.3)7 
Ye aE og EE) ge OO Oe ee 
n; 12 4 12 11 7 7 


= 3,015.262, 


and 


SS) yj =(7.3) + 4.5) +--+ + (3.7) = 3,174,010. 
i=1 j=l 


The resulting sums of squares are, therefore, given as 


SS; = yy - = 3,174.010 — 2,927.495 
i=) j= 
= 246.515, 
“yy? 
SSp = Y° = — = =3,015.262 — 2,927.495 
marti COUN 
= 87.767, 


and 


SSw = 3 y 93 - >i yi = 3,174.01 — 3,015.262 


i=). j= i=l nj 


= 158.748. 


Finally, the results of the analysis of variance calculations are summarized in 
Table 2.10. Here, the null hypothesis states that the variation in yields among 
varieties of corn is due entirely to the natural variability among replicates, 
whereas the alternative hypothesis states that there is an additional variability 
among varieties of corn. If we choose the level of significance a = 0.05, we 
find from Appendix Table V that the 95th percentage point of the F distribution 
with 5 and 47 degrees of freedom is 2.41. Since the computed F value of 5.196 
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TABLE 2.10 

Analysis of Variance for the Yields Data of Table 2.9 

Source of Degrees of Sum of Mean Expected 

Variation Freedom Squares Square MeanSquare FValue _ p-Value 

Between 5 87.767 17.553 024862602 5.196 <0.001 
Varieties 

Within 47 158.748 3.378 3=— a? 
Varieties 

Total 52 246.515 


from Table 2.10 is greater than 2.41 (p < 0.001), we may conclude that a2 > 0, 
or that the mean yields of the varieties differ significantly. 

Furthermore, if the experimenter 1s interested 1n estimating the magnitudes of 
the components of variance o2 and a2, we may obtain their unbiased estimates 
using the formulae (2.12.1) and (2.12.2). Hence, from (2.12.1) and (2.12.2), we 
find that 


6? = 3.378 
and 


» _ 17.553 — 3.378 


= 1.643. 
8.626 oo 


6 


2 


The estimate of the total variance Oy 


is then given by 


65 = 67462 = 3.378 + 1.643 
= 5.021, 


and the estimated proportion of the total variance accounted for by the varieties 
1S 


1.643 
= —— = 0.327. 
5.021 


Q>] & 
we NIQ WN 


Thus, we observe that about 33 percent of the variance among yields seems to 
be due to differences among varieties. The remaining 67 percent of variance 
can be attributed to within-variety variation. 

To obtain an exact 95 percent confidence interval for 07, we have 


MSy = 3.378, 2[47, 0.025] = 29.956, and y2[47, 0.975] = 67.821. 


Substituting these values in (2.12.3), the desired 95 percent confidence interval 
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for o? is given by 


3 x 3.378 , 47x 3.378 


<o. < ——— [| = 0.95 
67.821 7 29.956 


or 
P[2.341 < a? < 5.300] = 0.95. 
Now, to obtain an approximate confidence interval for a2, we have 


F[5, 47;0.975] = 2.851, F[5, 47;0.025] = 0.163, 
F[5, 00; 0.975] = 2.570, F[5, 00; 0.025] = 0.166, 
no = 8.626, nj = 7.563, 

F* =5.196, MS, = 15.591, 

L = —0.036, and U =3.66l. 


Substituting these values in (2.12.4), the desired 95 percent confidence interval 
for o2 is given by 


P[—0.030 < a? < 11.985] =0.95. 


Z 


Furthermore, to obtain an approximate confidence interval for 02/02, we 


substitute the required quantities in (2.12.5), which yields the desired interval as 


2 
P| -0.036 < =t < 3.61 | = 0.95. 
é 
It is to be understood that the negative limits are defined to be zero. It is, however, 
informative to leave them with negative signs. 
Similarly, to obtain an approximate 95 percent confidence interval for the 
intraclass correlation, we have 


F* =5.195, F[5,47;0.025] = 0.163, and F/[5,47;0.975] = 2.851. 


Substituting these values in (2.12.6), the desired 95 percent confidence interval 
for the intraclass correlation is given by 


a 


(5.196/2.851) + (8.6261) o2+02  (5.196/0.163) + (8.626 — 1) 
~ 0.95 


(5.196/2.851) — 1 a? (5.196/0.163) — 1 
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Or 


Bs 


p|0.087 2 —e 2 0.782| (95. 
o? +02 ; 


é a 


2.15 USE OF STATISTICAL COMPUTING PACKAGES 


One-way analysis of variance can be performed by a number of statistical pack- 
ages using either a mainframe or a microcomputer. SAS, SPSS, and BMDP each 
contains various procedures to perform one-way analysis of variance. However, 
PROC ANOVA of SAS, ONEWAY of SPSS, and BMDP 7D are more suited 
for simple one-way designs including both balanced and unbalanced data sets. 
For the random effects model involving variance component estimation, one 
may prefer to use SAS GLM, SPSS GLM, and BMDP 8V or 3V. The out- 
put from these procedures provides an analysis of variance table, treatment 
means, and their standard errors. BMDP 7D has an extra feature of printing 
comparative histograms and all descriptive measures of location and variability 
for each group and for combined data. This provides an effective visual aid 
in making comparisons between different group means. To obtain similar des- 
criptive measures using SAS, one can use the MEANS statement in GLM or 
UNIVARIATE procedure for each group and for combined data. For an in- 
troduction to SAS, SPSS, and BMDP procedures for performing analysis of 
variance, see Chapter 11. 


2.16 WORKED EXAMPLES USING STATISTICAL PACKAGES 


In this section, we illustrate the applications of statistical packages to perform 
one-way analysis of variance for the data sets employed in examples presented 
in Sections 2.13 and 2.14. Figures 2.2, 2.3, 2.4, and 2.5 illustrate the program 
instructions and the output results for analyzing data in Tables 2.3, 2.5, 2.7, and 
2.9 using SAS ANOVA/GLM, SPSS ONEWAY/GLM, and BMDP 7D/8V/3V 
procedures. The typical output provides the data format listed at the top followed 
by the number of observations for each factor level, estimates of the factor level 
means, and the entries of the analysis of variance table. Note that in each case the 
results are the same as those obtained using manual computations in Sections 
2.13 and 2.14. 


2.17 POWER OF THE ANALYSIS OF VARIANCE F TEST 


The power of the F test of the analysis of variance is important in evaluating 
the sensitivity of the test and also in determining sample size needed to attain 
a given value of the power. We recall that the power of a test refers to the 
probability that the decision procedure will reject the null hypothesis when in 
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The SAS System 
Analysis of Variance Procedure 
Dependent Variable: YIELD 


DATA WHEATYLD; 

INPUT VARIETY YIELD; 
DATALINES; 

1 96 

137: Mean 
Square 
980.0000 


163.6000 


Sum of 
Squares 
2940.0000 
3272.0000 


Pr > EF 
0.0044 


F Value 
5.99 


Source DF 
Model 3 
: Error 20 
ROC ANOVA; 
LASSES VARIETY; Corrected 23 6212.0000 
Total 
YIELD Mean 


76.000 


C.V. 
16.82977 


Root MSE 
12.791 


R-Square 
VALUES 0.473278 
1234 


IN DATA 


LEVELS 
VARIETY 4 
NUMBER OF OBS. 
SET=24 


Anova SS Mean Square F Value Pr > F 
2940.0000 980.0000 5.99 0.0044 


Source DF 
VARIETY 3 


(i) SAS application: SAS ANOVA instructions and output for the one-way fixed effects analysis 


of variance. 


ATA LIST Test of Homogeneity of Variances 


VARIETY 1 
Levene df1 df2 Sig. 
Statistic 
1.402 3 20 
ANOVA 


Mean 
Square 


Sum of 
Squares 


2940.000 
3272 .000 
6212.000 


980.000 
163.600 


ONEWAY YIELD BY 
VARIETY (1,4) 
/STATISTICS=ALL. 


Between Groups 
Within Groups 
Total 


(ii) SPSS application: SPSS ONEWAY instructions and output for the one-way fixed effects 


analysis of variance. 


BMDP7D — ONE- AND TWO-WAY ANALYSIS OF VARIANCE WITH 
DATA SCREENING Release: 7.0 (BMDP/DYNAMIC) 


FILE='C: \SAHAI 
\TEXTO\EJE1.TXT'. 
FORMAT=FREE. 
VARIABLES=2. | ANALYSIS OF VARIANCE TABLE FOR MEANS 
NAMES=VART, YIELD. | SOURCE SUM OF DF MEAN F VALUE 
CODES (VART)=1, 2, ~SQUARE-- SQUARE 
3,4. 2940.0000 3 980.0000 
NAMES (VART)=I, II, 3272.0000 20 163.6000 
IIt,Iv. 


/INPUT 


/VARIABLE PROB. 
/GROUP 
| VARIETY 


| ERROR 


5.99 


/HISTOGRAM 


GROU PING=VART. 
VARIABLE=YIELD. 


EQUALITY OF MEANS TESTS; 
VARIANCES ARE NOT ASSUMED TO BE EQUAL 


WELCH 


8.94 


0.0028) 
0.0113] 


/END 
1 96 
1 37 
ae |LEVENE'S TEST FOR VARIANCES 3, 
4 68 


BROWN-FORSYTHE 


20 


(iii) BMDP application: BMDP 7D instructions and output for the one-way fixed effects analysis 


of variance. 


FIGURE 2.2 Program Instructions and Output for the One-Way Fixed Effects 
Analysis of Variance: Data on Yields of Four Different Varieties of Wheat 
(Table 2.3). 
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The SAS System 
Analysis of Variance Procedure 


DATA BLOODANA; 
INPUT DRUG BLOODPH; 
DATALINES; 

1 19 

111 


Dependent Variable: BLOODPH 
Sum of 
Squares 

258.00000 
76.00000 


Mean 
Square 
64.50000 

6.33333 


F Value 
10.18 


Pr > F 
0.0008 


DF 
4 
12 


Source 
Model 
Error 


5 2 


, 

PROC ANOVA; 
}CLASSES DRUG; 

MODEL BLOODPH=DRUG; 
RUN; 
CLASS 


Corrected 16 334.00000 
Total 
c.V. 


35.95159 


Root MSE 
2.5166 


BLOODPH Mean 
7.0000 


R-Square 
VALUES 0.772455 
12345 


IN DATA 


LEVELS 
5 
i NUMBER OF OBS. 


DF Anova SS 


258.00000 


Mean Square F Value Pr > F 
64.50000 10.18 0.0008 


Source 


(i) SAS application: SAS ANOVA instructions and output for the one-way fixed effects analysis 


of variance with unequal numbers of observations. 


DATA LIST Test of Homogeneity of Variances 
/DRUG 1 
BLOOPH 3-4. 
BEGIN DATA. 
119 
111 

15 

7 


Levene dfi df2 
Statistic 


-448 


Sig. 


BLOODPH 4 


ANOVA 


‘ Sum of 
2 Squares 
END DATA. 

ONEWAY BLOODPH BY | BLOODPH 
DRUG (1,5) 
/STATISTICS=ALL. 


258.000 
76.000 
334.000 


Between Groups 
Within Groups 
Total 


(ii) SPSS application: SPSS ONEWAY instructions and output for the one-way fixed effects 


analysis of variance with unequal numbers of observations. 


/ INPUT FILE ='C:\SAHAI 


\TEXTO\EJE2.TXT'. 
FORMAT = FREE. 
VARIABLES=2. 
/VARIABLE NAMES=DRUG, BLOOPH. 
/GROUP CODES (DRUG) =1, 2,3, 
4,5. 
NAMES (DRUG) =A, B,C, 
D,E. 
/HISTOGRAM GROUPING=DRUG. 
VARIABLE=BLOOPH. 
/END 


BMDP7D - ONE= AND TWO-WAY ANALYSIS OF VARIANCE WITH 
DATA SCREENING Release: 7.0 (BMDP/DYNAMIC) 


| ANALYSIS OF VARIANCE TABLE FOR MEANS 

| SOURCE SUM OF DF MEAN F VALUE 
SQUARES SQUARE- - 
258.0000 64.5000 
76.0000 


PROB. 


| DRUG 10.18 0.0008 


| ERROR 12 
QUALITY OF MEANS TESTS; 
VARIANCES ARE NOT ASSUMED TO BE EQUAL 
| WELCH 


119 
Bes fe! 


| BROWN-FORSYTHE 


2 


5 


(iii) BMDP application: BMDP 7D instructions and output for the one-way fixed effects 
analysis of variance with unequal numbers of observations. 


FIGURE 2.3 Program Instructions and Output for the One-Way Fixed Effects 
Analysis of Variance with Unequal Numbers of Observations: Data on Blood Anal- 
ysis of Animals Injected with Five Drugs (Table 2.5). 
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DATA INTERRAT; The SAS System 
INPUT STAFF RESP; General Linear Models Procedure 
DATALINES; Dependent Variable: RESP 
Sum of Mean 
75 Source DF Squares Square F Value Pr > F 
Model 4 1683.3000 420.8250 5.75 0.0052 
Error 15 1097.2500 73.1500 
Corrected 19 2780.5500 
Total 


R-Square C.V. Root MSE RESP Mean 


; 0.605384 10.51356 8.5528 81.350 

PROC GLM; 

CLASSES STAFF; Source DF Type I SS Mean Square F Value Pr > F 
MODEL RESP=STAFF; STAFF 4 1683.3000 420.8250 5.75 0.0052 
RANDOM STAFF; Source DF Type III SS Mean Square F Value Pr > F 
RUN; STAFF 4 1683 .3000 420.8250 5.75 0.0052 
CLASS LEVELS VALUES 

STAFF 5 12345 Source Type III Expected Mean Square 

NUMBER OF OBS. IN DATA SET=20 | STAFF Var (Error)+4 Var (STAFF) 


(i) SAS application: SAS GLM instructions and output for the one-way random effects analysis 


of variance. 


DATA LIST Tests of Between-Subjects Effects Dependent Variable: RESP 
/STAFF 1 
RESP 3-4. Source Type III Ss df Mean Square F Sig. 
BEGIN DATA. 

86 STAFF Hypothesis 1683.300 4 420.825 5.753 .005 

75 Error 1097.250 15 73.150 (a) 

94 a MS(Error) 

86 

67 Expected Mean Squares (a,b) 

86 Variance Component 

: Source Var (STAFF) Var (ERROR) 

91, STAFF 4.000 1.000 
END DATA. ERROR -000 1.000 
GLM RESP BY a For each source, the expected mean Square equals the sum of the 
STAFF coefficients in the cells times the variance components, plus 
/DESIGN STAFF aquadratic term involving effects in the Quadratic Term cell. 
/RANDOM STAFF. |b Expected Mean Squares are based on the Type III Sums of Squares. 


(ii) SPSS application: SPSS GLM instructions and output for the one-way random effects 


analysis of variance. 


/ INPUT FILE='C: \SAHAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE - 
\TEXTO\EJE3.TXT'. EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 

VARIABLES=4. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 

/VARIABLE NAMES=R1,R2,R3,R4. 

/DESIGN NAMES=STAFF, RESP. SOURCE ERROR SUM OF : F 
LEVELS=5, 4. TERM SQUARES SQUARE 
RANDOM=STAFF, 

RESP. MEAN STAFF 132356.450 132356.5 314.52 0.0001 
MODEL='S, R(S)'. STAFF R(S) 1683.300 420.8 5.75 0.0052 
/END R(S) 1097.250 73.2 
86 75 94 86 


85 84 92 91 SOURCE EXPECTED MEAN ESTIMATES OF 
ANALYSIS OF VARIANCE DESIGN SQUARE VARIANCE COMPONENTS 
INDEX STAFF RESP 

NUMBER OF LEVELS 5 4 20 (1) +4 (2)+(3) 6596.78125 
POPULATION SIZE INF INF 4(2) + (3) 86.91875 
MODEL ~°S, R(S) (3) 73.15000 


(iii) BMDP application: BMDP 8V instructions and output for the one-way random effects 


analysis of variance. 


FIGURE 2.4 Program Instructions and Output for the One-Way Random Ef- 
fects Analysis of Variance: Data on Interview Ratings by Five Staff Members 
(Table 2.7). 
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DATA CORNVARI; The SAS System 
INPUT VARIETY YIELD; General Linear Models Procedure 
DATALINES; Dependent Variable: YIELD 
Sum of Mean 
Source DF Squares Square F Value Pr > F 
Model 5 87.767041 17.553408 5.20 0.0007 
Error 47 158.748431 3.377626 
Corrected 52 246.515472 


Total 


PROC GLM; R~-Square c.V. Root MSE YIELD Mean 
CLASSES VARIETY; 0.356031 24.72838 1.8378 7.4321 

MODEL YIELD = VARIETY; 

RANDOM VARIETY; Source DF Type I SS Mean Square F Value Pr > F 
RUN; VARIETY 5 87.767041 17.553408 5.20 0.0007 
CLASS LEVELS VALUES Source DF Type III SS Mean Square F Value Pr > F 
VARIETY 6 12 3 4 5 | VARIETY 5 87.767041 17.553408 5.20 0.0007 


6 
NUMBER OF OBS. IN DATA Source Type III Expected Mean Square 
SET=53 VARIETY Var(Error) + 8.6264 Var (VARIETY) 


(i) SAS application: SAS GLM instructions and output for the one-way random effects analysis 


of variance with unequal numbers of observations. 


DATA LIST Tests of Between-Subjects Effects Dependent Variable: YIELD 
/VARIETY 1 
YIELD 3-6(1) Source Type III SS df Mean Square FE Sig. 
BEGIN DATA. VARIETY Hypothesis 87.767 5 17.553 5.197 .001 
Error 158.748 47 3.378 (a) 
a MS (Error) 


Expected Mean Squares (a,b) 
Variance Component 

‘ Source Var (VARIETY) Var (ERROR) 
6 3. VARIETY 8.626 ' 1.000 
END DATA. ERROR -000 1.000 
GLM YIELD BY a For each source, the expected mean square equals the sum of the 
VARIETY coefficients in the cells times the variance components, plus a 
/DESIGN VARIETY | quadratic term involving effects in the Quadratic Term cell. 
/RANDOM VARIETY. |b Expected Mean Squares are based on the Type III Sums of Squares. 


(ii) SPSS application: SPSS GLM instructions and output for the one-way random effects 


analysis of variance with unequal numbers of observations. 


/ INPUT FILE='C: \SAHAI BMDP3V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\TEXTO\EJE4.TXT'. Release: 7.0 (BMDP /DYNAMIC) 
FORMAT=FREE. DEPENDENT VARIABLE YIELD 
VARIABLES=2. 

NAMES=VAR, YIELD. PARAMETER ESTIMATE STANDARD EST/ TWO-TAIL PROB. 

CODES (VAR)=1,2,3, ERROR ST.DEV. (ASYM. THEORY) 
4,5,6. ERR. VAR. 3.377 0.696 

NAMES (VAR) =F;S,I, CONSTANT 7231 0.584 12.376 0.000 


L,O,C. RAND( 1) 1.619 1.294 
DEPENDENT=YIELD. 
RANDOM=VAR. TESTS OF FIXED EFFECTS BASED ON ASYMPTOTIC VARIANCE 
METHOD=REML. -~COVARIANCE MATRIX 


SOURCE F-STATISTIC DEGREES OF PROBABILITY 
FREEDOM 
CONSTANT 153.18 1 52 0.00000 


(iii) BMDP application: BMDP 3V instructions and output for the one-way random effects 


analysis of variance with unequal numbers of observations. 


FIGURE 2.5 Program Instructions and Output for the One-Way Random Effects 
Analysis of Variance with Unequal Numbers of Observations: Data on Yields of 
Six Varieties of Corn (Table 2.9). 
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fact the null hypothesis is false. We now illustrate the calculation of the power 
of the F test for Models I and II separately. 


MODEL | (FIXED EFFECTS) 


It follows from (2.6.3) and (2.6.6) that the power of the F test for the hypothesis 
Ho:a; =O =1,...,a) 1s given by 


1—B = P{F'[a—1,a(n—1);A] > Fla —1,a(n—1);1-—a]}, (2.17.1) 


where F'[a — 1, a(n — 1); 1 — a] is the 100(1 — @)-th percentage point of the F 
distribution with a— 1 and a(n — 1) degrees of freedom and F’[a—1, a(n—1);A] 
is a Statistic having a noncentral F distribution (see Appendix G) with a — 1 
and a(n — 1) degrees of freedom and the noncentrality parameter 4 given by 


a 
n 
—— e 2.17.2 
20° a ( ) 


When there are unequal numbers of observations (n;) at different factor levels, 
the noncentrality parameter takes the form 


b. $e. . od 
ie Ie me (2.17.3) 


Remark: To evaluate the probability expression given by (2.17.1), one needs to employ 
noncentral F tables which involve evaluation of the integrals for the noncentral F 
distributions. Such tables of power or the probability expression (2.17.1) have been 
calculated by Tang (1938) and by Tiku (1967, 1972). To tabulate the power for each a, 
it would require a triple-entry table consisting of v,, v2, and A. Tang’s tables give the 
probability of type II error, 6, corresponding to the degrees of freedom v; = 1 (1) 8, v2 = 
2, 4, 6, 7 (1) 30, 60, 00; normalized noncentrality parameter @ = {24/(v; + 1)}/? = 
1(0.5) 3(1)8; and level of significance a = 0.05 and 0.01. The computations of these 
probabilities are based upon the incomplete noncentral beta distribution. These tables are 
reproduced in Graybill (1961, pp. 444-459). Tiku’s tables give the power corresponding 
tov; = 1(1)10, 12; v. = 2 (2) 30, 40, 60, 120, 00; d = 0.5, 1.0 (0.2) 2.2 (0.4) (3.0); and 
a = 0.005, 0.01, 0.025, 0.05, and 0.10 and are also reproduced in Graybill (1976, pp. 
672-686). An abriged version of these tables, a = 0.01, 0.05, and 0.10, is also reprinted 
in this volume as Appendix Table VII. Among other tables, Lehmer (1944) calculated @ 
as a function of (a, 1 — B, v), v2) fora = 0.01, 0.05; 1 — B = 0.7, 0.8. More extensive 
tables, which also include Lehmer tables, were published by the National Bureau of 
Standards, Washington, DC in 1960, fora = 0.01, 0.02, 0.05, 0.10; 1 — 6 = 0.10, 0.50, 
0.90, 0.95, 0.99; except (a, B) = (0.10, 0.10), (0.20, 0.10); v; = 1 (1) (10), 12, 15, 20, 
24, 30, 40, 60, 120, 00; and v2 = 2 (2) 12, 20, 24, 30, 40, 60, oo. 


In addition to the tables previously described, the computation of the power of 
the F test is facilitated by the availability of power charts, prepared by Pearson 
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and Hartley (1951) and Fox (1956), which make calculation of the probabilities 
(2.17.1) quite simple. Pearson-Hartley charts, reproduced in Appendix Chart II, 
are uSed as follows: 


(i) The charts are given for vy; = 1,2,..., 8, the value of which is shown 
in the upper left-hand corner of each chart. 

(ii) Two levels of significance, namely, w = 0.05 and a = 0.01, are given 
in the charts. 

(iii) There are two X-scales (abscissas) corresponding to the two significance 
levels used. The left set of curves for each chart corresponds toa = 0.05 
and the right set refers to a = 0.01. 

(iv) Separate curves are provided for different values of v2. For each chart, 
the values of v2 are given at the top of the chart. Because only selected 
values of v2 are provided in the charts, an interpolation is generally 
required for intermediate values of v2. There are eight curves corre- 
sponding to v2 = 6, 7, 8, 9, 10, 12, 15, 20, 30, 60, oo. 

(v) The X-scale (abscissa) represents ¢, the normalized noncentrality pa- 
rameter, and the Y-scale (ordinate) represents the desired power (1 — B) 
as defined in (2.17.1). 


Remarks: (i) For the calculation of the power (1 — 8), we need to know the value of ¢ 
beforehand (whose exact value is unknown). To estimate the value of ¢, one requires the 
knowledge of w;’s and the error variance o?. An estimate of a? can be obtained based on 
previous experimentation or a pilot study. If several estimates are available, the largest 
one should be taken. The larger the value of o7, the larger will be the value of n required 
to achieve a given level of power. 

(ii) A special condensation of the Pearson and Hartley charts was published by Duncan 
(1957), who plotted on a single set of axes the values of @ corresponding to 1 — B = 0.5 
and 0.90 for various values of v,; and v2. Separate charts were presented for a = 0.05 
and 0.01. Having v; and v2 on the same chart facilitates the computations involving both 
of these degrees of freedom. 

(iii) The Fox charts, constructed using the Tang and Lehmer tables, are useful in 
determining the design parameters (combination of v; and v2 values) in order to obtain 
a desired power against a specified alternative. The charts consist of the following: 


(a) Ina (1, ¥2)-plane and for fixed values of a and £, the charts give the contours 
of @. These are curves on which @ has certain constant values. 

(b) Because of the choice of a reciprocal scale for v, and v2, these curves appear to 
be nearly straight lines. 

(c) The curves are arranged in eight separate charts for wa = 0.01 and 0.05, and 
1— 6 = 0:5, 0.7, 0.8, and 0.9. 

(d) There are two monograms included among the Fox charts. They are designed 
to facilitate the interpolation to values of 1 — 6 different from 0.5, 0.7, 0.8, and 
0.9. 


The Fox charts, reprinted in Scheffé (1959, pp. 446-455), are used as follows. For a 
given pair of values (8, ¢), the point corresponding to this pair is located in each of the 
two grids and the straight line drawn through these two points is then the (approximate) 
contour of @. 
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(iv) The Pearson and Hartley charts and the Fox charts serve a somewhat com- 
plementary role: the former being designed for v; = 1, 2,...,8; and the latter for 
Vv) = 3 43, ney OO; 


Example 1. Consider the pharmaceutical research example described in 
Section 2.13 and suppose that the company wishes to know the power of 
the F test of the experiment when there are substantial differences between 
mean blood pH readings for different drugs. More specifically, suppose 
one wishes to consider the case whena, = 8, a> = —3, a3 = —l,a4 = 0, 
and a5 = —3. From (2.17.3), the value of the noncentrality parameter A is 


_ ol 


= 5 glt3(8)° + 4(—3)" + 3-1? + 400)" + 3(-3)'H 


Xr 


1 
— (258.0), (2.17.4) 
202 


e 


where an estimate of o? is obtained from MSw = 6.333. Substituting the 
value of o? in (2.17.4), we get 


(258.0) = 20.37 


———————— 
2(6.333) 


and the normalized noncentrality parameter is 


Furthermore, for this example, we have v; = 4 and vz = 12. Hence, 
for a = 0.01, we find from the Pearson-Hartley charts given in Appendix 
Chart II that the power is approximately 0.94. The use of Tiku’s tables given 
in Appendix Table VII with appropriate interpolation gives essentially the 
same result. Thus, there are about 94 chances in 100 that the F test will 
detect the differences in the mean blood pH readings of the five drugs 
given the specified differences in their magnitude. 


In addition to tables and charts described above one can use a normal ap- 
proximation to the distribution of the square root of the noncentral F variate 
to evaluate the power expression (2.17.1). For example, it can be shown that 
(see, e.g., Johnson et al. (1995, pp. 491—492)). 


yf vivy! (2v2 — 1) Flv, v23 A] — V2(1 + 22) — (yy + 2A) + 4A) 
VY vy) Flv, v2; A] + (41 + 2A) + 4A) 


(2.17.5) 
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is approximately normally distributed with mean zero and standard deviation 
one.'* Applying the normal approximation (2.17.5) to equation (2.17.1), it fol- 
lows that (see, e.g., Fleiss (1986, pp. 372—373); Day and Graham (1991)) 


Di Pp SP (ZS 2-8), 


where 
oo oe J v2 [2(y, + 2A)? — (1; + 4A)] — Vv (vy, + 24) 202 — 1) Fy, v231 -—a@] 
— Jvi (uy; + 2A)F IY, 9s 1 — a] + ¥9(0; + 4A) | 


Example 2. For Example 1 considered previously, v} = 4, v2 = 12, 
dX = 20.37, a = 0.01, F[4, 12; 0.99] = 5.41, and z;_g 1s determined by 


12[2(4 + 2 x 20.37)? — (4+4 x 20.37)] — /4(4+ 2 x 20.37)(2 x 12 —1) x 5.41 
JV4(4 +2 x 20.37) x 5.41 + 12(4+ 4 x 20.37) 


£1-B = 
= 1.51. 


Now, from Appendix Table I, the power is given by 
1—B~= P(Z < 1.51) = 0.93, 


which gives nearly the same value as that obtained earlier using Pearson 
Hartley charts and Tiku’s tables. 


MODEL II (RANDOM EFFECTS) 


Under Model II, it follows from (2.7.10) that the power of the F test for the 
null hypothesis Hp :o2 = 0 is given by 


Fla — l,a(n — 1);1-—a] 


2.17.6 
1+no2/o2 


= Pf ra l,a(n — 1)] > 


Thus, in the case of Model I, the power of the F test depends only upon the 
(central) F distribution and is, therefore, more readily calculable than the power 
under Model I, which involves the noncentral F distribution. Furthermore, it 
is readily seen that the power of the F test for the more general hypothesis 


14 More accurate approximations of the noncentral F variate to evaluate the power expression 
(2.17.1) can be based on the central F distribution. For a discussion of these and some other 
approximations to the noncentral F distribution, see Johnson et al. (1995, pp. 491-495). 
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(2.7.11) will be given by 


1+Nn po 
P{ Fla —1,a[n — 1]] > ————- F[a — l, a(n — 1); 1 —-a]}, 
1+no2/o? 


(2.17.7) 


which again involves only the central F distribution.!> 

To simplify the computation of probabilities (2.17.6) and (2.17.7), curves 
giving 1—power have been drawn (see, e.g., Bowker and Lieberman (1972, pp. 
309-313)). These are similar to the Pearson-Hartley charts for the fixed effects 
case described earlier. These curves are reproduced in Appendix Chart III. 


Example 3. Suppose that a = 4,n = 6, 02/02 = 2.5, and @ is taken to 
be 0.05. From Appendix Table V, we then find that F'[3, 20; 0.95] = 3.10, 
and using (2.17.6) the power of the test is given by 


— P{F[3, 20] > 3.10/(1 + 6 x 2.5)} 
— P{F[3, 20] > 0.19} 
— 0.90. 


2.18 POWER AND DETERMINATION OF SAMPLE SIZE 


We have seen in the preceding section that under Model I, once )>"_, a? and a? 
have been specified, the power of the F test becomes merely a function of n. By 
making n suitably large, the power of the F test can be made accordingly large 
for any nonzero specified value of }~V_, «?/a7. That is, for a fixed value of 
yy a? lene as n increases, ¢@ also increases. The larger @¢, for a fixed level of 
significance a, the larger is the power. Therefore, the sample size n can be deter- 
mined so as to make the power of the F test sufficiently large, say, 0.80 or 0.90 
with respect to the specified values of }“"_, a? and o?2. Equivalently, the sample 
size can be determined so as to make the experiment sensitive enough to detect 
differences in the parameters @;’s that are considered large enough to be of prac- 
tical importance. As mentioned earlier, an estimate of a? can be obtained either 
from a pilot survey or from previous experimentation. However, the problem 
of specifying }“¢_, a? is rather a difficult one. Generally, )-7_, a? should be 
taken as the smallest value that is considered to be of practical importance. An 
expert judgment supported by a reasonable rationale is generally required. 


'S For discussions of the power function under nonnormality, see Tan and Wong (1980) and Singhal 
and Sahai (1994). For discussion of a general approach for making power calculations in the 
most frequently encountered statistical applications, including a fixed effect analysis of variance 
model, without reference to any tables and charts, see Wheeler (1974). 
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Example 1. Consider an example witha = 3 and the overall F test to be 
made at the level of significance a = 0.05. From tables of the noncentral 
F distribution, the power of the overall test may be determined for various 
combinations of @ and n. Suppose the minimum value of a; thought to be 
of practical importance is 3.0. If a2 is estimated to be 25, then 


[n(32 + 32 + 32) 
= Q. _ a 2. ae = 0.6 Sn. 


Hence, for this special case, using Appendix Table VII, the values of 


@ associated with various values of n and the corresponding power are 
determined as follows: 


3 5 7 9 1] 21 
1.0 1.3 1.6 1.8 2.0 2.7 
1—B: 021 O31 0.62 0.75 0.85 0.99 


If the power of 0.80 is considered to be appropriate, then letting n = 10 
will provide a test having approximately this power. If the power of 0.90 
is desired, then n should be between 11 and 21. It can be seen that when 
n = 13, the power is approximately 0.90. 


To facilitate the computation of the sample size in the absence of noncentral 
F distribution tables, special tables and charts have been prepared by Feldt and 
Mahmoud (1958a, b), Kastenbaum et al. (1970a), Bowman and Kastenbaum 
(1975), Cohen (1988, Chapter 8), and Day and Graham (1991). Here, we briefly 
describe the use of Feldt-Mahmoud charts. The charts, reproduced in Appendix 


Chart IV, give the values of n (Y-scale) as a function of ¢’ = [Si _, a? /a (X- 
scale) for specified values of the number of factors (a), the level of significance 
(a), and the power (1 — 8). The only difference between ¢ and ¢’ is that ¢’ 
does not involve n, because we wish to determine it. Thus, the relation of ¢’ to 


@ 1s simply 
g' = o/Jn. 


Two levels of significance are used in the charts, namely, a = 0.05 anda = 
0.01. The charts are given forr = a = 2, 3, 4, 5 and the values of 1 — 6B 
employed in the charts are: 0.5, 0.7, 0.8, 0.9, and 0.95. There are two X-scales 
depending on which level of significance is employed. Furthermore, the left set 
of curves on each chart refers to a = 0.05 and the right set tow = 0.01. There 
are separate curves for different values of P = 1 — B and the curves are indexed 
according to the value of P at the top of the chart. Since only the selected values 
of P are used in the chart, one needs to interpolate for intermediate values of 
1 — B. The sample size n may be read from the ordinate of the curve. 
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Example 2. Suppose an engineer wishes to determine whether four lead- 
ing brand names of light bulbs have the same mean life. The overall F test 
is to be made at a = 0.05 with a power of 1 — 8 = 0.9 and the value of ¢’ 
is found to be 0.8. To determine the required sample size, we refer to the 


chart for r = 4, locate the curve for P = 0.9 in the left set of curves cor- 
responding to aw = 0.05, and read the value of the ordinate at ¢’ = 0.8 on 
the X-scale. We find the value of n to be approximately equal to 7. Hence, 
7 bulbs of each brand should be tested to meet the given specifications. 


SAMPLE SIZE DETERMINATION USING SMALLEST 
DETECTABLE DIFFERENCE 


The problem of sample size determination considered previously makes use of 
the specified values of }-/_, «? and a2. However, as indicated there, the task 
of specifying )““_, a? is a difficult one. Moreover, for fixed effects, )-/_, a? 
is difficult to interpret as a meaningful measure. Instead, one can define the 
sensitivity in terms of the magnitude of the difference between any pair of the 
a;’s, say, A, which is meaningful to detect with a reasonably high probability. 
Suppose that for at least one pair of treatments |a; — a;| > A (i # j). Now, the 
minimum value of A (Amin) in (2.17.2), subject to the condition that at least two 
of the a;’s differ by A or more, occurs when the q@;’s are such that |a; — a,| > A 
and a; = O for all i 4 j,i # k. That is, when only two of the a;’s differ by A 
and the remaining a — 2 @;’s are zero.'® If the specified power of the test, 1 — B, 
is determined for Amin, then since power increases with 4, the power will be at 
least as large for all sets of a;’s satisfying the previous condition. 
Now, from equation (2.17.2), it follows that 
A2 
nx2x 7 nd? 


— ; 
sae 20? 40? 


Again, for a given value of A, knowledge of o, is necessary to determine the 
required value of the sample size n. Since o, is often not known precisely, the 
sensitivity parameter A is expressed as A/o, rather than A itself. On using 
equation (2.17.1) for the power of the test with A = nA? /4o%, the smallest 
value of n can be determined such that 1 — B > 1 — Bo, where # 1s the actual 
value of the type II error and Bo 1s the specified one. This implies that the actual 
power is at least as large as the specified value. Note that when there are only 


16 Tet the difference between the largest treatment effect, amax, and the smallest treatment effect, 
Amin, be denoted by Amax = Qmax -- @min. For any set of aj (§ = 1,2,...,a) satisfying 
this condition, the smallest value of A given by (2.17.2) is obtained when the remaining a — 2 
treatment effects a; = (max + @min)/2. Since a;’s satisfy the constraint that yi a; = O, this 
implies that @max = A/2, @min = —A/2, and a; = 0, otherwise. 
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two treatments, the problem is equivalent to that of the two-sample, two-sided 
Student’s ¢ test. Furthermore, it has been shown by Nelson (1983) that the 
sample sizes obtained for a fixed effects analysis are generally comparable to 
those obtained using the analysis of means method. 


Remark: Tables of sample sizes using this approach were prepared by Bratcher et al. 
(1970) for a = 0.5, 0.4, 0.3, 0.25, 0.2, 0.1, 0.05, 0.01; 1 — 6 = 0.7, 0.8, 0.9, 0.95; 
A/o,. = 1.0 (0.25) 2, 2.5, 3; anda = 2 (1) 11, 13, 16, 21, 25, and 31. Nelson (1985) 
extended these tables for some additional values; thatis,a = 0.1,0.05,0.01;1—8 = 0.5, 
0.8, 0.9, 0.95; A/o, = 0.4 (0.1) 1 (0.2) 2 (0.5) 3.0; and a = 2 (1) 9. Some of these tables 
are reprinted in Appendix Table IX. A similar but more comprehensive set of tables has 
been developed by Bowman (1972) and Bowman and Kastenbaum (1975). 


Example 3. Consider a one-way layout involving three treatments 
(a = 3), a significance level of a = 0.05, and a type II error of B = 0.2. 


For an effect size of A/o, = 1.5, Appendix Table IX shows that a sample 
size of 10 for each treatment will be required. 


2.19 INFERENCE ABOUT THE DIFFERENCE BETWEEN 
TREATMENT MEANS: MULTIPLE COMPARISONS 


In an analysis of variance problem involving the comparison of a group of 
treatment means, simply stating that the group means are significantly differ- 
ent may not be sufficient. In addition, the investigator probably also wants to 
know which particular means differ significantly from others, or if there 1s some 
relation among them. For example, in many controlled experiments, the inves- 
tigator plans the experiment in order to estimate and test hypotheses regarding 
a limited number of specific quantities. The analysis of variance F test does 
not directly provide answers to these questions. New test procedures, known as 
multiple comparisons, have been developed to answer questions such as these. 
A full discussion of these procedures is beyond the scope of this volume. We 
present only a brief introduction of some of these procedures. 


Remark: For detailed discussions of the topic, the interested reader is referred to the 
survey papers by Kurtz et al. (1965), O’Neil and Wetherill (1971), Chew (1976a, b, c), 
Miller (1977, 1985), Krishnaiah (1979), Stoline (1981), and Tukey (1991), including 
books by Miller (1966, 1981), Rosenthal and Rosnow (1985), Hochberg and Tamhane 
(1987), Toothaker (1991), and Hsu (1996). Several standard textbooks on statistics 
also contain references to many of these procedures. The survey paper by O’ Neill and 
Wetherill (1971) also provides a selected and classified bibliography of the subject. 


We begin with a definition of a linear combination of means, a contrast and 
orthogonal contrasts. 
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LINEAR COMBINATION OF MEANS, CONTRAST 
AND ORTHOGONAL CONTRASTS 


Any expression of the form 


L = €)py + lou. +--+ + lata; (2.19.1) 
where €;’s are arbitrary constants is called a linear combination of means. 
If one adds the constraint that }°"_, £; = 0, then the linear combination is 
called a contrast of a means ;’s,i = 1, 2,..., a. The expressions jz; — 43 and 


[41 — 242 + 43 are examples of contrasts. Two contrasts L; and L» defined by 
Ly = ly py + l9M2 +--+ + Lata 
and 
Ly = €) py + bur. +-+- +l Mae 
are said to be orthogonal if 
Cl + hob’, +++ thal, = 0. 
The expressions 4; — [42 and 4; + 2 — [3 — [44 are examples of orthogonal 
contrasts. 


Given a linear combination or a contrast of factor level means defined in 
(2.19.1), we can estimate it unbiasedly by 


L = 1:91, + l090. +-+* + LaFa- (2.19.2) 
The sum of squares associated with L is defined by 


(Sen) 


Ss; = =. (2.19.3) 
¥ (8 /m) 


Ifn; =n,i = 1,2,...,a, then (2.19.3) reduces to 


(2.19.4) 


It is easy to show that the expressions (2.19.3) and (2.19.4) are general formulae 
for any sum of squares distributed as a chi-square with one degree of freedom. 
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If we assume that each of the sample means is based on the same number 
of observations, then it can be shown that a — 1 orthogonal contrasts can be 
formed using a sample means. These a — 1 contrasts form a set of mutually 
orthogonal contrasts. In addition, it can be shown that the sums of squares of 
the a — 1 orthogonal contrasts will add up to the between-group sum of squares. 
In other words, the between-group sum of squares can be partitioned into a — 1 
sums of squares each having one degree of freedom corresponding to the a — 1 
orthogonal contrasts. Furthermore, mutual orthogonality is desirable because 
it leads to independence of the a — 1 sums of squares associated with the 
orthogonal contrasts. 


Remark: When the levels of a factor are quantitative rather than qualitative in nature, 
the investigator often wants to know whether the means follow a systematic pattern or 
trend; say, linear, quadratic, cubic, and the like. In such a case, the particular contrasts of 
interest involve those measuring linear, quadratic, cubic, or other higher-order trends ina 
series of means. As usual, with a means, we have a — 1 orthogonal contrasts, each having 
one degree of freedom. Thus, for three means, we have only two orthogonal contrasts: 
the linear and quadratic. With four means, there are three orthogonal contrasts: the linear, 
quadratic, and cubic, and so on. Many books on statistical design provide coefficients 
for linear, quadratic, cubic, and other contrasts (see, e.g., Fisher and Yates, (1963)). 
The most complete set of coefficients is given by Anderson and Houseman (1942). 
This type of analysis can be used to measure the trend of factor-level means associated 
with equally spaced values of a quantitative variable. It can also be used to assess the 
trend components of means obtained at fixed intervals of times; for example, in ongoing 
surveys of population characteristics and other time series data. 


Example 1. For the data on the yields of four different varieties of wheat 
given in Table 2.3, consider the following set of three mutually orthogonal 
contrasts: 


Ly = 11 = 9, 
Lz = 3 — ba, 


L3 = fi + M2 — 3 — Me. 
The unbiased estimates of the preceding contrasts are obtained as 


— Jo, = 69 — 92 = —23, 


Ly = j3. — ¥4. = 63 — 80 = —17, 
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Example 1 (continued ) 
and 

L3 = y1.+y2.— Fs, — Fa, = 69 + 92 — 63 — 80 = 18. 
The corresponding sums of squares are calculated as 


— _ 6(=23)? 
“(14+ 1) 
— 6-17" 


= 1,587, 


867, 


b> Ge) ~ 


6 (18) 


— —_________ = 486, 
+1+141) 


SS;, 
Now, it is readily verified that 
SSp = SS;, + SS;, + SS;,; 


that is, 2940 = 1587 + 867 + 486. 


TEST OF HYPOTHESIS INVOLVING A CONTRAST 


Since the y;.’s are independent, the variance of (2.19.2) is given by 


Var(L) = )— ¢? Var(yi.) 
i=] 


a 
ee 2 
= 0; ) £; / nj. 
i=l 
An unbiased estimate of this variance is 


Var(L) =MSw >> ¢?/ni- (2.19.5) 
i-1l 


Inasmuch as L is a linear combination of independent normal random variables, 
it is also normally distributed. It then follows that the statistic 


(ZL — L)/,/ Var(L) (2.19.6) 
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has a Student’s ¢ distribution with N — a degrees of freedom. Therefore, a 
suitable test statistic for testing the null hypothesis 


versus 
A,:L4#Lo 
1S 
L—Lo 
t(N —a]= ——= (2.19.8) 
IMSw >> 2 /n; 
i=] 
or, equivalently, 
(L — Lo)? 
Ftl, N -—a) = ———_- (2.19.9) 


MSy > €?/n; 
i=! 


A two-sided critical region is used with the ¢ test given by (2.19.8) whereas the 
critical region for the F test given by (2.19.9) is determined by the right tail. 
Finally, a 100(1 — @) percent confidence interval for L is given by 


PiiL-w<L<L+wj=1-a, (2.19.10) 


where 


y =t[N —a,1—a/2] (2.19.11) 


and t{N — a, 1 — a/2] denotes the 100(1 — a/2)th percentage point of the tr 
distribution with N — a degrees of freedom. 
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Example 2. For the data on blood analysis of animals injected with five 
drugs given in Table 2.5, consider the contrast L defined by 


L = py + 22 — b3 — M4 — bs. 


An unbiased estimate of this contrast is 


A 


L = yy, +22. — y3. — ya. — Ws. = 154+ 2(4) -6-7-4=6 


A test for the hypothesis 


versus 
A, Sus Si 0 


can be carried out by calculating the value of statistic (2.19.8) with Lo = 0. 
In this example, a = 5,n, = 3,n2 = 4,3 = 3,n4 = 4,n5 = 3, and 
MSy = 6.333. Then, on substituting in (2.19.8), we obtain 


6 6 
festa ita). aaa 20 


From Appendix Table III, we find that ¢ [12, 0.975] = 2.179 so that Ho is 
sustained at a = 0.05 level of significance. 

Finally, on substituting the appropriate quantities in (2.19.10), a 95 
percent confidence interval for L is given by 


P[6 — 2.179V14.249 < L < 64+ 2.179V 14.249] = 0.95, 


P[(—2.23 < L < 14.23] = 0.95. 


Since the interval includes the value zero, we conclude that it 1s not signi- 
ficantly different from zero. Thus, the results of the confidence interval are 
in agreement with that of the ¢ test given above. 
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THE USE OF MULTIPLE COMPARISONS 


After rejecting the null hypothesis (2.7.1), one might be tempted to make a 
comparison between each pair of factor level means, that is, 1 versus 2,..., 1 
versus a, 2 versus 3,...,a@ — 1 versus a by using the test procedures (2.19.8) 
or (2.19.9). But how many such comparisons need to be made? For a factor 
levels there are (5) = s a(a — 1) pairs to be compared, although there are only 
a — | degrees of freedom for factor levels. Clearly, not all such comparisons 
are independent. Thus, it is not proper to use tests on more than a — 1 such 
comparisons. Further suppose that an experimenter wishes to compare a factor 
levels using c independent (orthogonal) contrasts. If each one of the comparisons 
is tested with the same significance level, say, wa, and if we assume that MSw 
has an infinite number of degrees of freedom (so the tests are independent), then 
when all the null hypotheses involving c comparisons are true, the probability of 
falsely rejecting at least one of them is equal to 1 —(1 —a@)© Fora = 0.05 and 
c = 5, this probability is 1 — (0.95)° = 0.2262, and for c = 10, the probability 
increases to 1 — (0.95)! = 0.4013. Thus, if we do not reject the null hypothesis 
with the initial F test, and if we then perform tests based on contrasts, we 
increase the overall probability of committing a type I error.!’ Moreover, it is 
difficult to obtain an expression equivalent to 1 — (1 —a@)*° for comparisons made 
with nonindependent (nonorthogonal) contrasts. The difficulties just described 
in connection with the test procedures (2.19.8) or (2.19.9); or the confidence 
interval (2.19.10), are resolved by using a multiple comparison method. Several 
multiple comparison procedures are available in the literature. In the following, 
we consider two such widely employed procedures known as Tukey’s method 
and Scheffé’s method, respectively. 


Tukey’s method 
According to this method, if Z is any contrast estimated by L, then a 100(1 —q@) 
percent simultaneous confidence interval for L is given by 


F 1 . 1 
L—TJn7MSy (; >> «i <L<L+TJn'MSy (; > «i ; 
i=] i=l 
(2.19.12) 


where T = g[a, a(n — 1); 1 — a] 1s the 100(1 — a@)th percentage point of the 
Studentized range distribution with parameters a and a(n — 1). (For a defi- 
nition of the Studentized range distribution, see Appendix I.) Some selected 


17 Since we are considering individual comparisons as well as sets of such comparisons, there are 
two different types of error rates or significance levels at issue. When considering individual 
comparisons, the significance level is referred to as comparisonwise error rate. When considering 
an entire set of comparisons, the significance level associated with all the comparisons in the 
set is called the experimentwise error rate. 
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percentage points of the Studentized range distribution are given in Appendix 
Table X. Tukey (1953) has shown that for a given value of a, the intervals 
given by (2.19.12) hold simultaneously for every possible contrast that may be 
constructed (see also Scheffé (1959, p.74)). If the interval contains the value 
zero, the contrast is said to be not significantly different from zero; whereas if 
the interval does not contain the value zero, the contrast is said to be signifi- 
cantly different from zero. Thus, to test the null hypothesis (2.19.7), we note 
whether 


ui /| Jiri (5 Sei} » gla,a(n—1);1—a@]. (2.19.13) 


Tukey’s method was originally designed for contrasts comparing two means; 
that is, L = 4, — [42, and so on. It is seldom used in practice except for this 
special contrast. For this situation, 


5 lel=5 (1J+)—1) =1, 


and the intervals given by (2.19.12) reduce to 


yi. — x. —- TV n—! MSw < wi — B; < Yi. — Ye. tT Vn! MSw. (2.19.14) 


Thus, according to the Tukey’s method, the probability is 1 — a that the intervals 
(2.19.14) contain all a(a — 1)/2 pairwise differences of the type w; —;,1 Ai. 


Example 3. To illustrate this procedure, consider the data on the yield 
of four different varieties of wheat given in Table 2.3. Here,a = 4,n = 
6, N —a = 20, and MSw = 163.600. If we let a = 0.05, then from 
the values of the Studentized range distribution given in Appendix Table 
X, g[4, 20; 0.95] = 3.58. Now, the pairwise differences of sample means 
will be compared to 


q[4, 20; 0.95] /n-! MSw = 3.58,/163.600/6 = 18.69. 


Note that there are 4(3)/2 = 6 pairs of differences to be compared. The 
four sample means are 


¥),=69, yo. =92, y3,=63, and yy, =80; 


and the 6 pairs of differences can be arranged systematically as in Table 
2.11. 
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Example 3 (continued ) 


TABLE 2.11 
Pairwise Differences of Sample Means 


Yi. — Yr. 


The differences above the dotted line exceed 18.69. The conclusions are 
that the variety IT is significantly better than I and III. There is no significant 
difference between the varieties II and IV. The probability that we have 
made one or more incorrect statements is 0.05. 


Example 4. To illustrate Tukey’s method for a more complex contrast, 
say, L = 41 + 4 — [42 — 3, we use the same data as in Example 3. Here, 


1x2 1 
pa \é;| = si+1+1+I=2, T=4l4, 20; 0.95] = 3.58, 
i=l 


T /n— MSw (; - «i = 3.58./(163.600/6)(2) = 37.39, 
i=l 


L = 69+ 80 — 92 — 63 = —6. 


Now, substituting the appropriate quantities in (2.19.12), we get a 95 
percent simultaneous confidence interval for L as 


—6 — 37.39 < L < —6+ 37.39 


—43.39 < L < 31.39. 


Since the interval includes the value zero, we conclude that L is not signi- 
ficantly different from zero. 
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Scheffé’s method 
According to this method if L is any contrast estimated by L, then a 100(1 — @) 
percent simultaneous confidence interval for L is given by 


a a 
£—s |(a—1)MSw ¥>@?/n; < L < h4+S |(a—1)MSy ¥°@/n;, 
i=] i=] 


(2.19.15) 


where S* = F[a — 1,N —a;1 — a] is the 100(1 — @)th percentage point 
of the F distribution with a — 1 and N — a degrees of freedom. For a given 
value of a, Scheffé (1953) has shown that the intervals given by (2.19.15) 
hold simultaneously for every possible contrast that can be constructed (see 
also Scheffé (1959, p. 69)).!8 Again, as in the Tukey’s method, to test the null 
hypothesis (2.19.7), we note whether’? 


u/| [(a — 1)MSw ea > {Fla—1,N —a;1—a}}!”. 
i=] 


(2.19.16) 


Example 5. To illustrate this procedure, we again use the data of Table 
2.3. As before, a = 4,n = 6, N —a = 20, and MSw = 163.600. 
If we again let a = 0.05, then from Appendix Table V, we find that 
S2 = F[3, 20;0.95] = 3.10. Furthermore, for the contrasts consisting of 
the differences of two means, we have 


a ne 
Dera ee = 


Now, the differences between the sample means will be compared to 


{« — 1)(163.600) (=) 3.10 = 2752, 


instead of 18.69 as in Tukey’s method. Hence, in Table 2.11, again the 
sample differences y2, — y3, and y2, — yj, are significant by the Scheffé’s 


For a simple proof of the result (2.19.15) using elementary calculus, see Klotz (1969). 
For an extension of Scheffé’s method that tests all possible sets of comparisons encompassing 
all pairs of means, all possible contrasts between groups of means, and all possible partitions of 
the means, see Gabriel (1964). 
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Example 5 (continued ) 


method. It should, however, be observed that the critical value for mean 
differences for Scheffé’s method is larger than for Tukey’s method. In 
general, for simple contrasts of this type, Tukey’s method gives shorter 
intervals and consequently finds more differences significant. 


Example 6. To illustrate the computation of a more complex contrast, 
again, as in Tukey’s method, consider L = 4; + 4 — 2 — M3. Then 


L = 69 + 80 — 92 — 63 = -6, 

: 2 
2 /n, =(1+14+14+1/6= =, 

dG /ni =A 41414 D/6 = 5 


i=] 


(a —1)MSyw >) €?/n; /G.10) {« = 1163.600)( =) 
i=l 


31.85. 


On substituting the appropriate quantities in (2.19.15), we get a 95 percent 
simultaneous confidence interval for L as 


—6 — 31.85 < L < —6+31.85 


—37.85 < L < 25.85. 


So in this case the interval given by Scheffé’s method is shorter than 
that provided by Tukey’s method. In general, for more complex contrasts, 
Scheffé’s method gives shorter intervals than Tukey’s method. 


Example 7. To illustrate further, suppose that the experimenter had 
decided before conducting the experiment that the only contrast of 
interest is L = py + [4 — [2 — 3. Then, we can get an interval for L 
using (2.19.10), which is based upon the usual ¢ distribution. Now, from 
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Example 7 (continued) 


Appendix Table III, t[20, 0.975] = 2.086 and, on substituting the appro- 
priate quantities in (2.19.10), the 95 percent confidence interval for L is 
obtained as 


2 ys 
—6 — 2.086 163.600) ( =) <L < —6+ 2.086 163.600)( = 


—27.78 < L < 15.68. 


Thus, the interval constructed from (2.19.10) is much shorter than the one 
given by either Tukey’s or Scheffé’s method. 


Naturally, we would expect to get a shorter interval from a procedure de- 
signed to capture one prechosen contrast than from procedures that try to catch 
all possible contrasts. However, it is quite unlikely that the experimenter would 
specify a contrast in advance. Usually, we first test the hypothesis of equal fac- 
tor level means. If this is rejected, we attempt to discover the contrasts that are 
significantly different from zero. As has been discussed earlier in this section, it 
is usually difficult to calculate the significance level associated with the several 
intervals of the type (2.19.10). Thus, we would use the intervals (2.19.12) and 
(2.19.15), given by Tukey’s method and Scheffé’s method, respectively, which 
may be computed for contrasts that are selected after the experiment is con- 
ducted and for which we can make an exact probability statement. 


Example 8. Finally, we illustrate Scheffé’s method for the case of model 
(2.1.1) with unequal sample sizes by using the data on blood analysis of 
animals injected with five drugs given in Table 2.5. Let the contrast of 
interest be defined by 


L = py + 2u2 — b3 — ba — Ms 


which is estimated by 


L=154+2(4)-6-—7 —4=6. 


Furthermore, in this case, we have a =5,n, =3, no = 4, n3 =3,n4=4, 
ns = 3, and MSw = 6.333. For a = 0.05, we find from Appendix Table V 
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Example 8 (continued ) 


that S* = F[4, 12;0.95] = 3.26. Now, for the given contrast, we get 


pe fesse a eae 
ar as 


i=] 


S |(a—1)MSyw > @?/n; = /G.26) Ho - 16.333)(2)| 13,63. 
i=] 


On substituting the appropriate quantities in (2.19.15), the 95 percent 
simultaneous confidence interval for L is obtained as 


6.00 — 13.63 < L < 6.00+ 13.63 


—7.63 < L < 19.63. 


Note from Example 2 that this interval is larger than the interval based on 
the usual ¢ distribution. Again, the interval includes zero, and hence the 
contrast is not significantly different from zero. 


Interpretation of Tukey’s and Scheffé’s methods 

Suppose one were to calculate the confidence intervals for all conceivable con- 
trasts by using (2.19.12) or (2.19.15). Then, according to Tukey’s and Scheffé’s 
methods, the entire set of confidence intervals would be correct in 100(1 — @) 
percent of repetitions of the experiment. Note that this is a somewhat different 
interpretation than one from the ordinary confidence interval. When we make a 
95 percent confidence interval for, say, a single parameter 0, we are correct in 
saying that if we took all possible random samples of size n and calculated the 
interval for each, the interval would cover the true value of @ in 95 percent of the 
cases. For the multiple comparisons, however, we are referring to all possible 
comparisons that might be made on a given set of data, and the probability 
statement is about the event of all such intervals computed from a set of data 
covering the corresponding true values. It is, therefore, important that the initial 
F test be significant giving us prior reason to believe that reliable departures 
from the hypothesis exist. These differences are to be found among the possible 
comparisons. 
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Remark: Researchers who have used Tukey’s and Scheffé’s methods have occasionally 
been somewhat surprised to find that a significance of the overall analysis of variance 
F test has not led to at least one significant contrast. We expect that if the overall 
test is significant at the w-level, then at least the maximum possible contrast will also 
be significant at the a-level. Unfortunately, the maximum possible contrast may have 
been of little interest, and, therefore, may not have been computed. There is no guar- 
antee that the obvious contrasts (i.e., the differences within pairs of means) or the con- 
trasts most interesting to the experimenter will be significant when the overall F test is 
significant. 


Comparison of Tukey’s and Scheffé’s methods | 
In the following, we provide relative merits and drawbacks of Tukey’s and 
Scheffé’s methods of multiple comparisons. 


1. The Tukey’s method can be used only with equal sample sizes for all 
factor levels, but the Scheffé’s method is applicable whether the sample 
sizes are equal or not. 

2. Although the Tukey’s method is applicable for any general contrast, 
the procedure is most powerful when comparing simple pairwise dif- 
ferences and not when making more complex comparisons. 

3. If only pairwise comparisons are of interest, and all factor levels have 
equal sample sizes, Tukey’s method gives shorter confidence intervals 
and thus is more powerful. 

4. Inthecase of comparisons involving general contrasts, Scheffé’s method 
tends to give narrower confidence limits, and thus provides a more 
powerful significance test. 

5. The Scheffé’s method has the property that if the F test produces sig- 
nificant results, then the corresponding Scheffé’s multiple comparison 
will detect at least one statistically significant contrast from all possible 
contrasts. Thus, we are able to draw more conclusions than merely that 
all factor level means are different. 

6. The Scheffé’s method requires the use of the tables of the F distribution 
which are more readily available than the tables of the Studentized range 
distribution used by the Tukey’s method. 

7. The Scheffé’s method is less sensitive to violations of normality and 
homogeneity of variance assumptions than is the Tukey’s method. 


OTHER MULTIPLE COMPARISON METHODS 


In addition to Tukey’s and Scheffé’s methods described previously, there are a 
number of other multiple comparison procedures that are used widely in many 
substantive fields of research. In the following, we briefly outline some other 
common procedures for making a post hoc comparison. 


Least significant difference test 
The test also commonly referred as the protected least significant difference 
(LSD) was originally proposed by Fisher (1935). The test is carried out in steps 
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as follows: 


1. First, an overall F test at a given level of significance (@) is carried 
out to determine whether there are significant differences among the 
treatment groups. 

2. Only if the F test in step 1 1s significant, pairwise comparisons among 
treatments are performed using a f test at level a. 


Assuming a two-sided alternative, the pair of means jz; and jz; would be declared 
significant if 


] ] 
lv. — Ww. > tLN —a,1— a/2],|MSw (- + ~). (2.19.17) 
nN; nj 


The quantity to the right of the inequality in (2.19.17) is called the least signif- 
icant difference. If the design is balanced, that is, ny} = np = --- =ng = 7, 
then it reduces to t[a(n — 1), 1 — w@/2]./2MSw/n. To use the LSD procedure, 
one need simply compare the observed differences between each pair of sample 
means to the corresponding least significant difference. If the sample difference 
exceeds this quantity, one may conclude that the pair of means are significantly 
different. 


Remark: The use of the preliminary F test in the LSD procedure helps to protect the 
overall error rate under the null hypothesis of no differences in the set of treatment 
effects. However, even if a single comparison differs significantly from zero, the LSD 
approach does not protect against finding chance differences among other comparisons. 


Bonferroni’s test 

The test is based on the principle that if there are k null hypotheses to be tested, 
then a desired overall error rate of at most a can be achieved by testing each 
null hypothesis at level w/k. Equivalently, if there are k confidence intervals 
each constructed at confidence level 100(1 — @/k) percent, then they all hold 
simultaneously with confidence level of at least 100(1 — @) percent. To see 
how this works, suppose that each null hypothesis is tested at level @* and let 
E; denote the event that the i-th null hypothesis is rejected. Then, the overall 
probability of a type I error (@) 1s given by 


a= P{E,UE,U ---UE,} 
< P(E\)+ P(E2) +--+ + PCE) 
= ka*. 
Thus, if each one of the & null hypotheses is rejected at level w@/k, the overall 


error rate 1s at most aw. The procedure is known as the Bonferroni’s method, 
since it 1s based on the Bonferroni or Boole inequality. Sometimes it is also 
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referred to as Dunn’s multiple comparison procedure following Dunn (1961) 
who examined the properties of the procedure in detail and prepared tables to 
facilitate its use. The method is fairly simple and versatile and gives reasonably 
good results if k is not very large. The procedure tends to be somewhat conser- 
vative, that is, true confidence levels tend to be greater than 1 — a. The method 
should be used when there are only few comparisons to be made and none of 
the other procedures are appropriate.” Special percentage points of the ¢ distri- 
bution for very small values of a are usually required to determine Bonferroni 
intervals. Specially designed tables for this purpose are given in Dunn (1961), 
Pearson and Hartley (1970, Table 9), Bailey (1977), Miller (1981, p. 238), and 
Kafadar and Tukey (1988). Moses (1978) provides charts for finding upper per- 
centage points for a in the range of 0.01 to 0.00001. Koehler (1983) gives a fairly 
simple and accurate approximation for the extreme percentiles of the ¢ distri- 
bution. Many statistical packages and other computer programs have standard 
routines for calculating percentage points. Some selected percentage points of 
the ¢ distribution to determine Bonferroni intervals are given in Appendix Table 
XIII. For some further discussions and details about the Bonferroni statistic, 
see Dunn (1959, 1961) and Dunn and Massey (1965). 


Remark: Holm (1979) introduced a modified Bonferroni test that consists of a class of 
sequentially rejective Bonferroni (SRB) procedures which results in greater power than 
the Bonferroni’s test. Under Holm’s SRB criterion, if any hypothesis is rejected at the 
level a* = a/k, then the denominator of a* for the next test is k — 1, and the criterion 
continues to be modified in a stepwise manner, with the denominator of a* decreased 
by 1 each time a hypothesis is rejected, so that tests can be conducted at successively 
higher significance levels. The experimentwise error rate of the SRB procedures is <a@ 
as is that of the standard Bonferroni procedure. Shaffer (1986) introduced a refinement 
of Holm’s SRB test known as the modified sequentially rejective Bonferroni (MSRB) 
test which is at least as powerful as the Holm’s test while maintaining an experimentwise 
error rate <a. 


Dunn-Sidak’s test 

According to Bonferroni’s or Dunn’s test, given k comparisons or contrasts 
each to be tested at the level w*, the overall error rate (aw) cannot exceed ka*. For 
small values of @, it provides an excellent approximation to the upper bound. 
However, an even better approximation to the upper bound can be obtained by 


20 Fleiss (1986, pp. 106-107) made the following recommendations regarding the use of the 
Bonferroni method. It should be preferred to Scheffé if the number of comparisons is less than 
a’; it should be preferred to Tukey if fewer than all a(a — 1)/2 comparisons are needed or if a 
relatively small number of other comparisons are to be made; and it should be preferred over 
the Dunnett if the comparisons of interest are other than or in addition to those between each of 
several treatments and a control. 
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a multiplicative inequality given by Sidék (1967). It can be shown that?! 
a<1—(l1—a*y¥ < ka’. 


Thus, instead of testing each contrast at the a/k level of significance, as in 
Bonferroni’s or Dunn’s method, each contrast can be tested at the 1 — (1 —a@)!/* 
level of significance’. In general, the use of the Dunn-Sidak’s method requires 
a slightly smaller critical value and hence leads to a more powerful test and 
a narrower confidence interval than the Bonferroni’s method. For example, 
let w@ = 0.05 and suppose there are five contrasts to be tested. Now, a/k = 
0.05/5 = 0.01 and 1 — (1 — a@)!/* = 1 — (1 — 0.05)!” = 0.0102. Thus, 
the difference between the two significance levels is negligible. Games (1977) 
developed tables of critical values of the ¢ statistic for use with the Dunn-Sidak’s 
method. The values are reprinted in Appendix Table XIV. Dunn (1961) and 
Games (1977) made comparisons of Bonferroni and Dunn-Sidak procedures 
with Tukey and Scheffé procedures and found that when there are many means 
in an experiment and the number of comparisons of interest is relatively small 
compared to the number of means in the experiment, the Bonferroni and Dunn- 
Sidak procedures yield shorter confidence intervals than either the Tukey or 
Scheffé procedure. 


Newman-Keuls’s test 

The test also known as the Student-Newman-Keuls’s test was first proposed by 
Newman (1939) and subsequently popularized by Keuls (1952). The procedure 
follows a predetermined criterion for grouping means into subsets and adjusts 
the overall error rate w according to the number of means to be tested. It uses 
the Studentized critical range values determined by 


MSy 
Wa) = gla, a(n — 1); 1 —a@] a 


where g[a, a(n—1); 1—a] is the 100(1 —a)th percentage point of the Studentized 
range distribution with parameters a and a(n — 1). The procedure consists of ar- 
ranging the a means jj., y2.,..., Ya, In ascending order as Wy < W2) < °° < 
Ya). It then divides the group means into mutually exclusive subsets so that 
means within a subset are not significantly different and the means from distinct 
subsets are different. For this purpose, the quantity Wa) — y1), called the range of 
the set of means, 1s calculated. Now the observed range yq) — \1) 1s compared 
with W,q). If it is less than W(q), the procedure stops and we conclude that the 
a means are not significantly different. If Ya) — Yay 1s greater than Wig), we 


21 This result was earlier proved by Dunn (1958) for certain special cases. 

22 Assuming k independent significance tests, each using a significance level of a*, the probability 
of not committing type I error in any of the k tests is (1 — a*)*. The overall or experimentwise 
error rate (i.e., the probability of committing at least one type I error rate) is then given by 
a = 1 —(1 —a*)*. The solution for a* yields a* = 1 — (1 — a)!/*. 
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divide ¥(1), ¥2),---, (a) Into two groups, one containing Wa), Wa—1), «++» Y(2)> 
and the other containing ¥q—1), Y(a—2), ---» Yc). Next, each of the ranges in two 
subgroups, ViZ., Ya) — ¥2) and ¥a—1) — Vay, 1s compared with Wi,_1) determined 


by 
MSw 
Wa-1) = gla = 1, a(n — 1); 1 = a] — 4 
n 


If either range does not exceed Wig_1), then the means in each of the two 
groups are not significantly different and the procedure stops. If either or both 
ranges exceed W,,_1), then the a — 1 means in the corresponding group(s) are 
further divided into two groups of a — 2 means each and the ranges for these 
groups are compared with W(g—2). The procedure is continued until a group of 
i means is found whose range does not exceed W,;) or all the means have been 
compared. 


Duncan’s Multiple Range test 

This test developed by Duncan (1952, 1955) also ranks the group means by 
magnitude and then obtains subsets of means that are not significantly different. 
The method adjusts its overall error rate for each comparison rather than a 
prechosen level based on the total number of group means to be determined. 
The procedure is carried out exactly the same way as the Newman-Keuls’s test 
except that the observed ranges are now compared with Duncan’s critical range 
values determined by 


MS 
Dia) = Ria, a(n — 1);1 — a], —, 
n 


where R[a, a(n — 1); 1 — a] denotes the 100(1 — a@)th percentage point based 
on Duncan’s multiple range distribution with parameters a and a(n — 1). Some 
selected percentage points of Duncan’s multiple range distribution are given 
in Appendix Table XII. The procedure has been found to be somewhat less 
conservative than Newman-Keuls’s test. 


Dunnett’s test 

The test developed by Dunnett (1955) is especially designed for experiments 
that include a control group and the researcher wishes to compare all the re- 
maining group means with the control. The procedure is a simple modification 
of the usual t test with the change that the differences between means involving 
the control group |¥;. — y-|,i = 1,2,...,a—1 are compared with the critical 
ranges determined by 


l 1 
D{a — 1,a(n — 1);1—a] MSw (— + ~), 
Nj Nc 
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where jy, is the mean of the control group and D[a — 1, a(n — 1);1 — @] 
denotes the 100(1 — a)th percentage point of the Dunnett distribution with 
parameters a — 1 and a(n — 1). Some selected percentage points of the Dunnett 
distribution are given in Appendix Table XI. Since the test does not make 
any comparison among the noncontrol groups, it is generally more powerful 
than other procedures for comparing a control group with other groups. It is 
important to point out that the Dunnett procedure should be used when the only 
comparisons of interests are the individual treatments against the control.” 


Remark: The SAS PROBMC function computes probabilities or quantiles from the 
one-sided or two-sided distribution of the Dunnett’s statistic with finite and infinite 
degrees of freedom for the variance estimate. For futher information and numerical 
examples of PROBMC function, see SAS Institute (1997, Chapter 28). 


MULTIPLE COMPARISONS FOR UNEQUAL SAMPLE SIZES AND VARIANCES 


Most of the multiple comparison procedures presented so far are appropriate for 
designs involving equal sample sizes and assuming equal population variances. 
We now summarize some of the procedures designed to be used with unequal 
sample sizes and variances. 


Unequal sample sizes 

For pairwise comparisons, Tukey (1953) and Kramer (1956, 1957) proposed a 
modification of Tukey’s method where a harmonic mean of n; and n; is inserted 
for n in equation (2.19.14). The resulting intervals known as Tukey-Kramer 
intervals are given by 


1 


1 1 
yi. — Wr. — gla, N —a;1—a],/- (— + =) MS < pj — pi 
2\n; nj 


2 nj 


< y; — yy. t+ qla, N —a;l—a] =(-+=) MSy. (2.19.18) 
l t 
Remark: Winer (1962; 1971, p. 216) and Miller (1966, p. 43; 1981, p. 43) proposed 
the idea for general type contrasts where the harmonic mean of unequal n;’s (@ = 
1,2,...,a) is substituted for m in equation (2.19.12). Simulation studies by Dunnett 
(1980a) showed that for pairwise comparisons Tukey-Kramer intervals (2.19.18) provide 
approximate probability coverage. Later, Hayter (1984) proved analytically that the 
probability coverage is always conservative (> 1 — a). However, Tukey-Kramer-Miller- 
Winer procedures are not robust to unequal variances (see, e.g., Howell and Games 
(1973); Keselman et al. (1975); Keselman and Rogan (1978)). 


23 If the researcher is interested in comparing the combination of groups to the control group, 
Scheffé’s test or a generalization of Dunnett’s test proposed by Shaffer (1977) may be used. 
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For pairwise comparisons involving k simultaneous intervals, the Bonferroni 
intervals are given by 


at. tt 
yj. — yy. — tN — a, 1 — a /2k],|MSy (— + a < pi pi 


< 5; — ye + t[N —a, 1 —a/2k] [aisw ( — 


In the one-way classification involving all pairwise comparisons k is a(a— 1)/2. 
However, occasionally k would be less if some mean comparisons a priori are 
not of interest. 

For pairwise comparisons, Hochberg (1974) proposed a modification of 
Tukey- Kramer intervals given by 


| | 
yi. — We. — m[a(a — 1)/2, N —a;1—a],/MSy (— + ae < [hi — 
n n 


< jj. — 9. + mla(a — 1)/2, N —a;1—a] [asw (— 


where m[p, v, 1 — a] is the 100(1 — @)th percentage point of the Studentized 
maximum modulus distribution for p = a(a — 1)/2 pairwise means with v = 
N — a degrees of freedom for the error. (For a definition of the Studentized 
maximum modulus distribution, see Appendix J.) Some selected percentage 
points of the Studentized maximum modulus distribution are given in Appendix 
Table XV. The studies by Tamhane (1979) and Dunnett (1980a) have shown that 
the procedure is not robust to unequal variances involving unequal sample sizes. 
Similarly, Spjotvoll and Stoline (1973) proposed the intervals 


; 1 1 
ii, — jv. —q'la, N —a;1—a]/MS —, 


< pj — by < Vi, — We. + q'[a, N — a; 1 — a) ~MSyw max 


ya va) 


where q’[p, v; 1—a] is the 100(1 —@)th percentage point of the Studentized aug- 
mented range distribution for p means with v degrees of freedom for the error. 
Tables of g’[p, v; 1 — a] have been prepared by Stoline (1978). Some selected 
percentage points of the Studentized augmented range distribution are given in 
Appendix Table XVI. These intervals, however, are conservative in the sense 
that the true coverage probability is greater than or equal to 1 — a. 


Remark: Ury (1976) studied some of the foregoing procedures including Scheffé and 
Dunn-Sidak intervals and found that the choice of the ‘“‘best interval” depends upon the 
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particular combination of sample sizes, significance level, number of groups, and the 
error degrees of freedom. Stoline (1981) made a detailed comparison of all the foregoing 
procedures and recommended the general use of the Tukey-Kramer intervals. 


Unequal population variances 
For pairwise comparisons, Games and Howell (1976) proposed a procedure 


where MSy in equation (2.19.18) is replaced by ,/S?/n; + Si [nj for the dif- 


ference involving the (i, i’)-pair of means. The error degrees of freedom N — a 
in g{a, N — a; 1 — qa] 1s further replaced by 


(S?/ni + S/nv)? 
= = oS oe a, oe 
[(S?/ni)° ri — 1) + [(S2/nv) re — 1] 


Extensive Monte Carlo studies by Tamhane (1979) and Dunnett (1980b) have 
shown that the procedure can give nonconservative a values, as high as 0.84, 
with unequal variances. For a modification of (2.19.18) based on the Studentized 
augmented range distribution, see Hochberg (1976). 

Dunnett (1980b) suggested a modification of the foregoing procedure in 
which the critical value qg[a, v;, 7; 1 — a] is replaced by 


qla,n; — 1;1 — o](S? /n;) + gla, nj — 131 — a}(S7 /ni') 


(S?/mi) + (Si /nv) 


It should be noted that the critical value given above corresponds to Cochran’s 
(1964) approximate solution to the Behrens-Fisher problem and thus the pro- 
cedure is expected to be conservative (Dunnett (1980b)). 

Dunnett (1980b) proposed another modification that utilizes the same statistic 
as in Games and Howell but the critical value q[a, v; ;,; 1 — aw] is replaced by the 
critical value m[a(a — 1)/2, v;,;7; 1 — a] of the Studentized maximum modulus 
distribution. Dunnett (1980b) found that the procedure is also conservative with 
unequal variances. 


Remark: Weerahandi (1995) proposed a modification of the Scheffé’s procedure of 
multiple comparison given by (2.19.16) to the case of unequal variances. The procedure 
is, however, too complicated and mathematically intractable for the practitioner to use 
in routine work. 


2.20 EFFECTS OF DEPARTURES FROM ASSUMPTIONS 
UNDERLYING THE ANALYSIS OF VARIANCE MODEL 


In making inferences from the analysis of variance model (2.1.1), we have made 
the following assumptions: 


(1) e;;’s are normally distributed; 
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(11) e;;’s have same variance oa}; and 
(111) e;;’s are independently distributed. 


It stands to reason that in any real-life applications none of the preceding as- 
sumptions can be expected to be completely satisfied. One rarely draws inde- 
pendent random samples from populations that are exactly normally distributed 
with precisely equal variances. The question naturally arises: What are the ef- 
fects of any departure from the assumptions of the model on the inferences 
made? For a thorough discussion of the topic, the reader is referred to Scheffé 
(1959, pp. 331-369), Miller (1986, Chapter 3), and Snedecor and Cochran 
(1989, Chapter 15). Here, we briefly summarize some of the main findings. 


DEPARTURES FROM NORMALITY 


For Model I, many investigations have been made to study the effect of nonnor- 
mality on both the level of significance and the power of the F test employed in 
the analysis of variance. Both analytic results (see, e.g., Scheffé (1959, pp. 345-— 
351)) and empirical studies by Pearson (1931), Geary (1947), Gayen (1950), 
Box and Anderson (1955), Boneau (1960, 1962), Srivastava (1959), Bradley 
(1964), Tiku (1964, 1971), and Donaldson (1968) attest to the fact that the 
failure to satisfy this assumption has little effect on the F test. Thus, if depar- 
ture from normality is not too extreme, the lack of normality does not present 
any serious problem, since the means will follow the normal distribution more 
closely than the variates themselves. Both the level of significance and the 
power of the F test are only slightly affected by any departure from normality. 
However, extreme nonnormality may result in a biased test. In this connection, 
it is important to mention that any departure from the kurtosis of the normal 
distribution (either more or less peaked) 1s much more serious than the skewness 
of the distribution in terms of the effects on inferences. Also, platykurtic (flat) 
and leptokurtic (peaked) distributions have little effect on the significance level 
but can have a marked effect on power, particularly when the sample sizes are 
small. Furthermore, only highly skewed distributions would have any marked 
effect either on the level of significance or the power of the F test. 

The point estimates of the factor level means and their contrasts are unbi- 
ased irrespective of whether populations are normal or not. Hence, the F test 
is generally robust against any departures from normality (in skewness and/or 
kurtosis) if sample sizes are large or even if moderately large. For instance, the 
nominal level of significance might be 0.05 whereas the actual level for a non- 
normal population might vary from .044 to .052 depending on the sample size 
and the magnitude of the kurtosis (Box and Anderson (1955)). Generally, the 
actual level of significance in the presence of positive kurtosis (platykurtic) is 
slightly higher than the specified one and the real power of the test for positive 
kurtosis is slightly higher than the normal one. If the underlying population 
has negative kurtosis (leptokurtic), the actual power of the test will be slightly 
lower than the normal one (Glass et al. (1972)). Single interval estimates of the 
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factor level means and contrasts and some of the multiple comparison methods 
are also not much affected by the lack of normality provided the sample sizes 
are not too small. The robustness of multiple comparison tests in general has 
not been as thoroughly studied. Among the few studies in this area is that of 
Brown (1974). A number of studies, however, have investigated the robustness 
of several multiple comparison procedures, including Tukey and Scheffé, for 
exponential and chi-square distributions and found little effect on both signif- 
icance level and power (see, e.g., Petrinovich and Hardyck (1969); Keselman 
and Rogan (1978)). Dunnett (1982) reported that Tukey is conservative both 
with respect to significance level and power for long-tailed distributions and 
to outliers. Similarly, Ringland (1983) found that the Scheffé was conservative 
for distributions with influence to outliers. 

For Model II, the lack of normality has more serious implications than 
Model I. The estimates of the variance components are still unbiased, but their 
variances depend on the kurtosis of the distribution and the actual confidence 
coefficients for interval estimates of 02,02, 02/02 may be substantially dif- 
ferent from the specified one (Singhal and Sahai (1992)). Furthermore, when 
testing the null hypothesis that the variance of a random effect is some specified 
value different from zero, the test is not robust to the assumption of normality. 
For some illustrations and numerical results, the reader is referred to Arvesen 
and Schmitz (1970) and Arvesen and Layard (1975). However, if one is con- 
cerned only with a test of the hypothesis o2 = 0, then slight departures from 
normality have only minor consequences for the conclusions reached when the 
sample size is reasonably large (see, e.g., Tan and Wong (1980); Singhal and 
Singh (1984); Singhal et al. (1988)). 


DEPARTURES FROM EQUAL VARIANCES 


Both the analytical derivations by Box (1954a) and the empirical studies cited 
earlier indicate that if the variances are unequal, the F test for the equality of 
means under Model I is only slightly affected with respect to moderate vio- 
lations of this assumption provided the sample sizes do not differ greatly and 
the parent populations are approximately normally distributed”* (Glass et al. 
(1972)). Generally, unequal error variances increase the actual level of signif- 
icance slightly higher than the specified level and result in a slight elevation of 
the power function to a degree related to the magnitude of differences among 


24 When the variances are unequal, an approximate test similar to the approximate ¢ test when two 
group variances are unequal may be used (Welch (1956)). For a description of the test and some 
illustrative examples, see Zar (1996, pp. 189-190). The method has been shown to perform rather 
well when population variances are unequal (Kohr and Games (1974); Levy (1978a); Dijkstra 
and Werter (1981)). For some other approaches to analysis of variance involving heterogeneous 
variances, see James (1951), Brown and Forsythe (1974a,b), Bishop and Dudewicz (1978), 
Clinch and Keselman (1982), Krutchkoff (1988), Wilcox (1988, 1993), and Alexander and 
Govern (1994). For a survey and comparisons of traditional ANOVA alternatives with other 
alternative procedures, see Coombs et al. (1996). 
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variances (Box (1954a)). If larger variances are associated with larger sample 
sizes, the level of significance will be slightly less than the nominal value, and 
if they are associated with smaller sample sizes, it will be slightly greater than 
the nominal value (Horsnell (1953); Kohr and Games (1974)). Similarly, if the 
sample sizes do not differ greatly, the Scheffé’s method for multiple compari- 
son is only slightly affected due to any lack of homogeneity of error variances. 
Thus, the F test and related procedures are fairly robust against any violation 
from equal error variances provided the sample sizes are nearly equal. Compar- 
isons of factor level means based on a single contrast, however, are significantly 
affected by unequal variances even when samples sizes are equal. 

On the other hand, when different numbers of cases appear in various sam- 
ples, even relatively small departures from the assumption of homogeneous 
variances can have very serious consequences for the validity of the final in- 
ference (see, e.g., Scheffé (1959, p. 351); Welch (1956); James (1951); Box 
(1954a); Brown and Forsythe (1974a); Bishop and Dudewicz (1978); Tan and 
Tabatabai (1986)). According to Box (1954a), for samples of unequal sizes, 
even a small violation of this assumption can have a marked effect on the level 
of significance. The actual significance level will exceed the nominal level when 
smaller samples are drawn from more heterogenous populations and will be less 
than the nomimal value when the smaller samples are drawn from more ho- 
mogeneous populations. Furthermore, Rogan and Keselman (1977) found that 
the actual significance level may be appreciably larger when the variances are 
quite heterogenous. Moreover, the effect of unequal variances is not apprecia- 
bly reduced simply by increasing the samples sizes, as long as the ratios of the 
sample sizes remain unchanged. 

Krutchkoff (1988) made an extensive simulation study in order to determine 
the size and power of several analysis of variance procedures, including the F 
test, Kruskal-Wallis test and a new procedure called the K test. It was found 
that both the F test and the Kruskal-Wallis test are highly sensitive whereas the 
K test is relatively insensitive to the heterogeneity of variances. The Kruskal- 
Wallis test, however, is not as sensitive to the unequal error variances as the F 
test; and was found to be more robust to nonnormality (when the error vari- 
ances are equal) than either the F test or the K test. A more recent study by 
Lix et al. (1996) seems to indicate that violations of the variance homogeneity 
assumptions can have serious consequences for control of the type I error rate 
regardless of whether group sizes are equal or unequal, but particularly in the 
latter case. The study also found that all the parametric alternatives of the anal- 
ysis of variance test had superior performance when the variance homogeneity 
assumption was violated. Furthermore, when the group sizes were equal, the 
effect of nonnormality on the type I error rate of the F test was no different when 
variances were equal than when they were unequal. The error rates remained 
close to the nominal level regardless of the degree of nonnormality when vari- 
ances were equal and were always inflated across the nonnormal distributions 
when variances were unequal. The pattern was also evident when group sizes 
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were unequal. Thus, whenever possible, the experimenter should try to achieve 
a nearly equal number of cases in each factor level unless the assumption of 
equal population variances can reasonably be assured in the experimental con- 
text. It should be observed that the use of equal sample sizes for all factor levels 
not only tends to minimize the effects of unequal variances using the F test, 
but also simplifies the computational procedure. 

For Model II, the effect on the robustness of the F test is the same as for the 
fixed effects model. For balanced designs the effects are minimal but can have 
serious effects for unbalanced designs. However, the lack of homoscedasticity 
Or unequal error variances can have serious effects on inferences about the 
estimation of variance components even when all factor levels contain equal 
sample sizes. 


DEPARTURES FROM INDEPENDENCE OF ERROR TERMS 


Lack of independence can result from biased measurements or possibly from 
a poor allocation of treatments to experimental units. Departure from indepen- 
dence could also arise in an experiment in which experimental units or plots are 
laid out in a field so that adjacent plots give similar yields. Lack of independence 
can likewise result from correlation in time rather than in space. Thus, the most 
frequent violation of independence assumption occurs when the observations 
are recorded over some time-space coordinate in which adjacent observations 
tend to be correlated. Nonindependence of the error terms can have important 
effects on inferences for both Models I and II. If this assumption is not met, both 
the level of significance and the power of the F test may be strongly affected 
and very serious errors in inferences can be made (Scheffé (1959, p. 945)). The 
direction of the effect depends on the nature of the dependence of the error 
terms. In most cases encountered in practice, the dependence tends to make 
the value of the ratio too large and consequently the significance level will be 
smaller than it should be (although the opposite can also be the case). Thus, 
positive correlations among the error variances within a factor level may cause 
too many significant results based on the F test and the effect on the ¢ test may 
be even greater. 

Since the violation of this assumption is often difficult to remedy, every 
possible effort should be made to obtain independent random samples. The use 
of ramdon sampling or randomization in various stages of the study can be a 
most important protection against independence of error terms. In general, great 
care should be taken to see that the data are based on independent observations, 
both between and within groups; that is, each observation is in no way related 
to any of the other observations. Although dependency among the error terms 
creates a special problem in any analysis of variance, it is not required that 
the observations themselves be completely independent for Model II to apply. 
However, just as Model I is not robust to the assumption of independence, 
Model II is also not robust to this assumption. Violation of this assumption 
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generally results in declaring too many significant results in the F test. The 
effects on various point and interval estimates of a2, however, are unknown. 


2.21 TESTS FOR DEPARTURES FROM ASSUMPTIONS 
OF THE MODEL 


As we have seen in the preceding section, the analysis of variance procedure 
is robust and can tolerate certain departures from the specified assumptions. It 
is, nevertheless, recommended that whenever a departure is suspected it should 
be investigated. In this section, we briefly discuss the tests for normality and 
homoscedasticity. 


Remark: Before carrying out the formal statistical procedures for testing normality and 
homogeneity described here, it may be fairly informative and useful to explore the data 
graphically. For example, one can use box-plots for different groups and/or within group 
histograms to see if the distribution of values in each group is symmetric and free of 
any gross outliers and other anomalies in the data; and if the spread of the data across 
groups is fairly constant. Ideally, if the analysis of variance assumptions are satisfied, 
box-plots should be symmetric and the spreads across the groups should be nearly the 
same. However, in many practical problems involving small sample sizes, the skewness 
and homogeneity may be difficult to evaluate in this way. Box-plots are discussed in most 
introductory statistics textbooks, or one may refer to a book on exploratory data analysis 
(see, e.g., Tukey (1977); Chambers et al. (1983); Hoaglin et al. (1983); Cleveland 
(1985)). Box-plots and related graphical techniques are also available in most statistical 
packages currently being used for data analysis. For a discussion of analysis of variance 
from the viewpoint of exploratory data analysis, see Hoaglin et al. (1991). 


TESTS FOR NORMALITY 


A relatively simple technique to determine the appropriateness of the assump- 
tion of normality is to graph the data points on a normal probability paper. 
If a straight line can be drawn through the plotted points, the assumption of 
normality is considered to be reasonable. 

We now consider some formal tests for normality. They are the chi-square 
goodness-of-fit test, and the tests for skewness and kurtosis which are often 
used as supplements to the chi-square test. | 


Chi-square goodness-of-fit test 

In this test the data are grouped into classes to form a frequency distribution and 
the sample mean and standard deviation are calculated. From these quantities a 
normal distribution is fitted and expected frequencies in each class are obtained. 
Let 0; and e; represent the observed and expected frequencies for the i-th class. 
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Then the test criterion 1s based on the quantity 


X= Yo; —e;)*/e;, (2.21.1) 


I 


where the summation is taken over all the classes. If the data actually come 
from a normal distribution, then the quantity (2.21.1) follows approximately a 
chi-square distribution with k — 3 degrees of freedom, where k is the number 
of classes used in the calculation of X7. If the data come from some other 
distribution, the observed 0; will tend to agree poorly with the values of e; that 
are expected on the assumption of normality and the calculated value of X? 
will become large. Consequently, large values of X* lead to the rejection of 
the hypothesis of normality. Thus, if the calculated value of the statistic X? 
exceeds x’[k — 3, 1 — a], the 100(1 — @)th percentage point of the chi-square 
distribution with k — 3 degrees of freedom, we reject the null hypothesis that 
the sample is selected from a normal population. 

For the validity of the chi-square test, it is required that the expected frequen- 
cies e; should not be too small. Small expected values are likely to occur only 
in the extreme classes. A working rule is that two extreme expectations may be 
each as low as 1, provided that most of the other expected values exceed 5. If the 
expected values are lower than 1, classes are combined to give an expectation 
of at least 1. For a more detailed discussion of these questions refer to Cochran 
(1954), Larntz (1978), and Koehler and Larnzt (1980). 


Test for skewness 

One indication of nonnormality occurs when the relative frequency histogram 
for the sample data is highly skewed to either the left or right. A measure of 
amount of skewness is given by /23, the third moment about the population mean, 
which is the average value of (x — jz)* taken over the population. The skewness is 
positive or negative according to the sign of 23. If low values are clustered close 
to the mean pz but high values extend far above the mean, 13 will be positive since 
the large positive contributions of (x — j2)° when x exceeds jz will predominate 
over the smaller negative contributions of (x — jz)? obtained when x is less than 
wt. Similarly, 43 will be negative when the lower tail is the extended one. The 
meaning of positive and negative values of {43 1s illustrated in Figure 2.6 


The actual measure of skewness is given by the coefficient of skewness 
defined as 


_ 3 _ B3 
Vie a5. 
ag) 


(2.21.2) 


o3 


The quantity (2.21.2) is independent of the measurement scale and can be 
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fly) fy) fly) 


¥,;> 0 ¥; < 0 ¥, = 90 


FIGURE 2.6 Curves Exhibiting Positive and Negative Skewness and Symmetrical 
Distribution. 


estimated from the sample data by 


A= (2.21.3) 


where 


m3 = dO —yy/n and m= 20% — y)*/n. 


(= 


A test of the null hypothesis that the sample data are selected from a normal 
population can be based on the statistic 


A 


Y1 


J6/n- 


where Z is a standard normal variate. The assumption that Z has approximately 
a standard normal distribution is accurate enough for this test if n exceeds 150. 
For sample sizes between 25 and 200 the one-tailed 5 percent and 10 percent 
significance values of /, have been determined from a more accurate approx- 
imation and appear in Pearson and Hartley (1970). Some of these values are 
reprinted in Appendix Table X VII(a). 


Z= (2.21.4) 


Test for kurtosis 

A second kind of departure from normality can be detected by examining the 
kurtosis of the distribution. The kurtosis of a distribution is measured by the 
quantity 


M4 bE 
w= SH, (2.21.5) 
no) 0 


where y2 is called the coefficient of kurtosis. Unlike the coefficient of skewness 
(v1), ¥2 measures the heaviness of the tail of a distribution. For the normal 
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population 44 = 3145 so that y2 = 3. The lighter-tailed distributions will have 
a large pile-up near yz and so y2 > 3. The heavier-tailed distributions, such as 
at distribution, will have less pile-up about jz and so y2 < 3. This is illustrated 
in Figure 2.7. 


fy) fy) fy) 


¥2 > 3 Y¥2 <3 ¥, = 3 


FIGURE 2.7. Curves Exhibiting Positive and Negative Kurtosis and the Normal 
Distribution. 


The quantity (2.21.5) can be estimated by 


y= ay) (2.21.6) 
where 


n 
m4 = 


(yi —y)4/n and mz = (yj — 9)°/n. 
t=] 


i=l 


For large sample sizes (n > 1,000) a test of the null hypothesis that y2 = 3 can 
be based on the statistic 


ee aaa (2.21.7) 


where again Z has approximately a standard normal distribution. Unfortunately, 
one seldom encounters a sample size with 1,000 or more observations and the 
test statistic (2.21.7) has very little practical utility. For smaller sample sizes, 
however, upper and lower percentage points of the distribution of 72 have been 
tabulated and can be used to establish the veracity of the null hypothesis. Tables 
of critical values are given in Pearson and Hartley (1970). Some of these values 
are reprinted in Appendix Table XVII(b). 

Geary (1935,1936) developed an alternative test criterion for kurtosis based 
on the statistic 


G = Mean deviation/Standard deviation 
n 
So ly: — yI/n 
_ i=l 


2.21.8 
Jr — 
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The significance values of the statistic (2.21.8) have been tabulated for sample 
sizes down ton = 11. If y is anormal deviate, the value of G when determined 
for the whole population is 0.7979. Positive kurtosis yields higher values and 
negative kurtosis lower values of G. When applied to the same data, the statistics 
?, and G usually agree well in their conclusions. The advantages of G are that 
tables are available for smaller sample sizes and that G is relatively easier to 
compute. 


OTHER TESTS FOR NORMALITY 


The foregoing procedures are some of the classical tests of normality. Over 
the years a large number of other techniques have been developed for testing 
for departures from normality. In the following we describe some powerful 
omnibus tests proposed for the problem. For further information on tests of 
normality, see Royston (1983, 1991, 1993a,b,c). 


Shapiro-Wilk’s W test 
Shapiro and Wilk (1965) proposed a relatively powerful procedure based on 


the statistic 
' 2 
(» qj vs) 
Wan (2.21.9) 
YO — 9 
i=! 


where yy < 2) < +++ < Mn) represent the order statistics and the coefficients 
a;’s are the optimal weights for the weighted least squares estimator of the 
standard deviation for a normal population. Inasmuch as a,_;4; = —a;, the 
expression 4 ai yy can be written as yy An—i+1 (Mn—i41) — Yu)) where 
k = n/2, ifn is even, or (n — 1)/2, if n is odd. For n odd, the middle observation 
is used in the calculation of ye (y; — 9)’, but is not used in the calculation of 
yg Gn—i+1(Vn-i+1) — Wa). Thus, for n odd, ai.41)/2 = Ax41 Appears as zero in 
Appendix Table XVIII. Also note that the W test is two-sided because the test 
statistic (2.21.9) is in a quadratic form. The hypothesis of normality is rejected 
at the a significance level if W is less than the (1 —@)th quantile of the null distri- 
bution of W. The coefficients a;, for 2 <n < 50, were given by the authors and 
some selected values are given in Appendix Table XVIII. A short table of critical 
values of the statistic (2.21.9) originally given by Shapiro and Wilk (1965) 1s also 
reprinted in Appendix Table XIX. Royston (1982a, b) has provided an approx- 
imation to the null distribution of W and a FORTRAN algorithm for n < 2000. 
The Shapiro and Wilk W test is one of the most powerful omnibus tests for test- 
ing normality. Extensive empirical Monte Carlo simulation studies by Shapiro 
et al. (1968) and Pearson et al. (1977) have shown that W is more powerful 
against a wide range of alternative distributions. The test is found to be good 
against short or very long-tailed alternatives even for a sample as small as 10. 


94 The Analysis of Variance 


Example 1. To illustrate the procedure, consider a sample of 10 obser- 
vations given by 2.4, 2.7, 2.6, 3.4, 3.2, 3.5, 3.2, 3.4, 3.6, and 3.5. The 
ordered statistics are determined as 


24,276,221, 3:2; 3.2, 3:4, 514, 5:93.9-9} 9.0, 


Since n = 10, we have k = 5. Using the 5 coefficients a1, a7, a3, a4, as 
from Appendix Table XVIII, we obtain 


k 
YS ani Oan-i4) — Yu) 


i=] 
= 0.5739(3.6 — 2.4) + 0.3291(3.5 — 2.6) 
+.0.2141(3.5 — 2.7) + 0.1224(3.4 — 3.2) + 0.0399(3.4 — 3.2) 


= 1.18861. 


Furthermore, for the given set of observations, )-)_,(y; — y)? = 1.645. 
Hence, W =(1.18861)7/1.645 =0.859. From Appendix Table XIX, the 
5 percent critical value of the W statistic is W(10, 0.05) =0.842. Since 
W > W(10, 0.05), we cannot reject the hypothesis of normality and con- 
clude that it is reasonable to assume that the data are normally distri- 
buted. 


Shapiro-Francia’s test 
Shapiro and Francia (1972) proposed a modification of the W statistic defined 


by 
. 2 
(» bj vo] 
i i=] 
> O01 — 9 
i=] 


where the coefficients b; are determined by 


W (2.21.10) 


with m; representing the expected values of the order statistics from a unit 
normal distribution. Inasmuch as b,_;41 = —b;, the expression ae bi yi) Can 
be written as eae bn—i41 Yn-i+1) — Yay) Where k = n/2, if n is even, or (n — 
1)/2, if n is odd. For n odd, the middle observation is used in the calculation of 
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ys (y; — y)*, but is not used in the calculation of ae bn—i41 Mn-i41) — Ya)). 
Again, note that W’ test is two-sided because the test statistic (2.21.10) is in 
a quadratic form. The hypothesis of normality is rejected at the a significance 
level if W’ is less than (1 —a@)th quantile of the null distribution of W’. Extensive 
tables of m; are given in Harter (1961, 1969b). A small table of critical values 
of the statistic (2.21.10) is given by Shapiro and Francia (1972). 


Example 2. To illustrate the procedure, we use the data on birthweights 
of twelve piglets in a particular litter from an experiment reported by 
Royston et al. (1982). The data have also been referred to and analyzed by 
Royston (1993c). It is widely believed that piglets provide a good model 
for the human neonate, especially in studies involving turnover of glucose. 
The order statistics of birthweight data and the corresponding expected 
values of the order statistics from a unit normal distribution are given in 
Table 2.12. 


TABLE 2.12 
Calculations for Shapiro Francia’s Test 
2 3 4 5 6 7 8 


858 862 992 1006 1018 1020 1079 1088 1110 1120 1166 


» ~1.6292 —1.1157 -—0.7929 —0.5368 —0.3122 —0.1025 0.1025 0.3122 0.5368 0.7929 1.1157 1.6292 
Forn = 12,k = 6, the coefficients b;,i = 1,2,...,6, are determined 


feos mtr: op OO ee 
' /oR4718 | JfO847718 
0.5368 0.7929 


PSs EE AGT te. Shp e507, 
> /9-B4778 + /9.84778 


1.1157 1.6296 


Pet S555... aad: be 0 510 
> /9-84778 © \/9.84778 
Now, 


k 
> bn—i41 Ven-i41) — Ya) 


= 0.5192(1166 — 605) + 0.3555(1120 — 858) 
+ 0.2527(1110 — 862) + 0.1711(11088 — 992) 
+ 0.0995(1079 — 1006) + 0.0327(1020 — 1018) 


= 470.8363; 
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Example 2 (continued) 


and 


S “(i — 5)? = 263,616.6667. 


i=] 


Hence, the Shapiro-Francia Statistic W’ is given by 


W’ = (470.8363)*/263,616.6667 = 0.841. 


Since the critical values of the W’ statistic are not readily available, we 
employ a normal approximation due to Royston (1993c). It can be shown 
that the statistic log,(1 — W’) is approximately normally distributed with 
mean f& = —1.2725 + 1.052(v — wu) and standard deviation 6 = 1.0308 — 
0.26758(v + 2/u) where u = log,(n) and v = log,(u). The values of the 
normal deviate Z’ = {log,(1 — W’) — f2}/6 are referred to the upper-tail 
critical values of the standard normal distribution. Values of Z’ > 1.645 
indicate departures from normality at the 5 percent significance level. For 
the birthweight data considered previously, 


fe = —2.929, 6 =0.572, 


Z’ = {log,(1 — 0.841) — (—2.929)}/(0.572) = 1.91. 


Since Z’ > 1.645, the hypothesis of normality of the birthweight data is 
rejected (p = 0.028). 


D’Agostino’s D test 
D’ Agostino (1971) proposed a test statistic 


n 


] 
2 f = ain + db 


pe i=l 


n jn xXer — yy’ 
i=l 


which is also a modification of the W statistic where the coefficients a; are 
replaced by W; =i — +(n + 1) and, thus, no tables of coefficients are needed. 


(2.21.11) 
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Note that in contrast to W and W’ tests, the D test is two-sided since the 
statistic (2.21.11) is in a linear form. The hypothesis of normality is rejected 
at the aw significance level if D is less than the a/2th quantile or greater than 
(1 — @/2)th quantile of the null distribution of D. The test, originally proposed 
for moderate sample sizes, is also an omnibus test and can detect deviations 
from normality both for skewness or kurtosis. Tables of percentage points of the 
approximate standardized D distribution of the statistic (2.21.11) were given by 
D’ Agostino (1972). The test is computationally much simpler than the Shapiro- 
Wilk’s test. The studies by Theune (1973) have shown that the Shapiro-Wilk’s 
test is preferable over D’ Agostino’s test for sample sizes up to 50 for lognormal, 
chi-square, uniform, and U-shaped alternatives. 


Example 3. To illustrate the procedure, we again consider the data of 
Example 1. For the given data set, 


a ae | ‘ 
»~ f = ain si Db = Sina = ae +r 1) 0% 
i=1 


i=1 i=1 


1(2.4) + 2(2.6) + --- + 10(3.6) 


1 
— 510+ 1)(2.4+2.6+---+3.6) 


— 184.2 — (5.5)(31.5) 
— 10.95 


and, as before, )-”_,(y; —¥)” = 1.645. Hence, D = 10.95/{10./10(1.645)} 
= 0.26998. From Appendix Table XX, the 5 percent critical value of the 
D statistic is D(10, 0.05) = (0.2513, 0.2849). Since the calculated value 
of D lies in this interval, the hypothesis of normality is not rejected and 
we may conclude that it is reasonable to assume that the data are normally 
distributed. 


For discussions of tests especially designed for detecting outliers, see Hawkins 
(1980), Beckman and Cook (1983), and Barnett and Lewis (1994). Robust esti- 
mation procedures have also been employed in detecting extreme observations. 
The procedures give less weight to data values that are extreme in comparison 
to the rest of the data. Robust estimation techniques have been reviewed by 
Huber (1981) and Hampel et al. (1986). 


TESTS FOR HOMOSCEDASTICITY 


If there are just two populations (1.e., a = 2), the equality of two popula- 
tion variances can be tested by using the usual F test from the fact that the 
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statistic 


_ Sifor 
— S3/oz 


has the Snedecor’s F distribution with n, — 1 and nz — 1 degrees of freedom. 
Here, ao? and of are population variances and Se and S? are the corresponding 
sample estimators based on independent samples of sizes n; and nz, respec- 
tively. However, with a > 2, rather than making all pairwise F tests, we want 
a single test that can be used to verify the assumption of equality of popu- 
lation variances. There are several tests available for this purpose. The three 
most commonly used tests are Bartlett’s, Hartley’s, and Cochran’s tests.*> The 
Bartlett’s test compares the weighted arithmetic and geometric means of the 
sample variances. The Hartley’s test compares the ratio of the largest to the 
smallest variance. The Cochran’s test compares the largest sample variance to 
the average of all the sample variances. We now describe these procedures and 
illustrate their applications with examples. They, however, have lower power 
than is desired for most applications and are adversely affected by nonnormality. 
In the following, we are concerned with testing the hypothesis: 


Hy:07 =o3 =--- =O 
versus (2.21.12) 
A: a; a oF for at least one (i, 7) pair. 


Bartlett’s test 
The basic idea under the Bartlett’s (1937a, b) test is as follows. Given the ob- 
servations y;;’s from model (2.1.1), let 


] a 
Ta = >——— i - 1S? 


and 


2° For a discussion of an exact test based on the generalized likelihood ratio principle, which 
is asymptotically equivalent to the Bartlett’s test, see Weerahandi (1995). For a discussion 
of a general class of tests for homogeneity of variances and their properties, see Cohen and 
Strawderman (1971). 
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where 


ni 
ae 
Oi — Yi) 
I Ee 
ny — ] 
It should be noted that T, and Tg are weighted arithmetic and geometric av- 
erages of the S?’s which are the usual sample variances of the observations at 


different factor levels. It is well known that 
Tg <Ta 


and the two averages are equal if all S?’s are equal. Thus, the greater the variation 
among the S?’s the farther apart the two averages will be. Hence, if the ratio 


R=T1,/Tc 


is close to 1, we have the evidence that the population variances are equal. If R 
is large, it would indicate that the population variances are unequal. The same 
conclusion would follow if we use log,(R) = log,(7T4) — log,(TG) instead of 
R. Thus, Bartlett’s test is based on the statistic R or log, (R); rejecting the null 
hypothesis if the statistic is significantly greater than unity. 

Inasmuch as the sampling distribution of R or log,(R) is not readily available, 
Bartlett considered two approximations of R. First, for large sample sizes, a 
function of log,(R) has approximately a chi-square distribution with a — 1 
degrees of freedom under the hypothesis that the population variances are equal. 
More specifically, if each n; > 5, the statistic 


K 
B= ——, (2.21.13) 
1+L | 
where 
K = ) (nj — 1) log,(T4) — )_(n; — 1) log, (5?) (2.21.14) 
i=l i=l 
and 
1 - ] ] 
LSS ———_ — —_____—— 2.21.15 
3(a — 1) 2D n; — 1 = ( ) 
y(n - 1) 
i=] 
6 Equivalently, Bartlett’s test can be based on the statistic R-! = Tg/Ta, rejecting the null 


hypothesis if the statistic is significantly smaller than unity. 
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has approximately a chi-square distribution with a — 1 degrees of freedom (see 
also Nagasenkar (1984)). Thus, if the calculated value of the statistic B exceeds 
x°[a—1, 1—«a], the 100(1 —@)th percentage point of the chi-square distribution 
with a — 1 degrees of freedom, we reject the null hypothesis that the population 
variances are equal. The accuracy of this approximation has been considered 
by Bishop and Nair (1939), Hartley (1940), and Barnett (1962). 

The chi-square approximation to the distribution of the Bartlett’s test statistic 
(2.21.13) is not appropriate when any of the n;’s are less than five. An approx1- 
mation which is more accurate when some of the n;’s are small is based on the 
F distribution. The approximation consists of considering the statistic 


K | 
eee eee (2.21.16) 
v}(M oa K) 

where 
vy) =a-—l, (2.21.17) 
vy = (a+ 1)/L2, (2.21.18) 

and 

M = w/{1 —L+2/v}, (2.21.19) 


which has a sampling distribution approximated by an F distribution with v, 
and v> degrees of freedom. The values of vz will usually not be an integer 
and it may be necessary to interpolate in the F table. Good accuracy can be 
achieved by the method of two-way harmonic interpolation (see, e.g., Laubscher 
(1965)) based on the reciprocals of the degrees of freedom. Usually, however, 
the observed value of B’ will differ significantly from the tabulated value and 
in that case an interpolation may not be required. When k =2, and for equal 
sample sizes, Bartlett’s test reduces to the two-sided variance ratio F test. When 
two sample sizes are unequal, the two methods, however, may give different 
results (Maurais and Quimet (1986)). 


Remark: Exact critical values obtained from the null distributions of Bartlett’s statistic 
for the case involving equal sample sizes have been given by Harsaae (1969), Glaser 
(1976), and Dyer and Keating (1980). For very small values of n;’s tables are given in 
Hartley (1940) and Pearson and Hartley (1970). For equal sample sizes, some selected 
percentage points of the distribution are given in Appendix Table XXI. Algebraic ex- 
pressions for determining exact critical values for Bartlett’s test for unequal sample sizes 
have been derived by Chao and Glaser (1978) and Dyer and Keating (1980). 
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Example 4. To illustrate the procedure, consider the data given in 
Table 2.5. The calculations needed for Bartlett’s test are summarized in 
Table 2.13. On substituting the appropriate quantities into (2.21.14) and 
(2.21.15), and then into (2.21.13), we obtain 


K = 2.1005, 
L = 0.1736, 


B = 2.1005/(1 + 0.1736) = 1.7898. 


Since from Appendix Table IV, x7[4, 0.95] = 9.49 with a p-value of 
0.774, we do not reject the null hypothesis that the five variances are all 
equal. 


TABLE 2.13 
Calculations for Bartlett’s Test 


Treatment 7; — 1 S? log, S2_ (nj —1)S?_— (nj — 1) log, S? 


I 


2 16.0000 2.7726 32.0000 5.5452 
3 6.0000 1.7918 18.0000 5.3754 
2 3.0000 1.0986 6.0000 2.1972 
3 4.0000 1.3863 12.0000 4.1589 
2 4.0000 =1.3863 8.0000 2.7726 


To use the F approximation given by (2.21.16), on substituting the 
appropriate quantities into (2.21.17), (2.21.18), and (2.21.19), and then 
into (2.21.16), we have 


vy); = 4 
vy = 6/(0.1736)? = 199.1, 


M = 199.1/(1 — 0.1736 + 2/199.1) = 238.0311, 


(199.1)(2.1005) 
Bie ee SO, 
4(238.0311 — 2.1005) 
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Example 4 (continued ) 


Furthermore, for a = 0.05, we obtain from Appendix Table V that 
F[4, 120;0.95] = 2.45 and F[4, 00;0.95] = 2.37. Using the harmonic 
interpolation, based on the reciprocals of the degrees of freedom, we 
have 


] 


F[4, 199.1;0.95] =: 2.45 + 422.1 120 (2.37 BAS = 9 a1: 


oo = 120 


Since B’ = 0.44 < 2.41 witha p-value of P{F[4, 199.1] > 0.44} = 0.779, 
we may conclude that the five variances are all equal. Thus, the conclusion 
from the Bartlett’s test using the F approximation is the same as using the 
chi-square approximation. 


Example 5. In this example, we simultaneously illustrate the Shapiro- 
Wilk’s W test for normality followed by the Bartlett’s test for homogeneity 
of variances. We further describe the use of exact critical values for the 
Bartlett’s test statistic given in Appendix Table XXI. Dyer and Keating 
(1980) reported and analyzed data on the sealed bids on each of five Texas 
offshore oil and gas leases selected from 110 leases issued on May 21, 
1968. Using the probability plots it was shown that the bids on each of 
the five leases are lognormally distributed. The logarithmic scores of the 
sealed bids on each of five leases are given in Table 2.14. 

Proceeding as in Example 1, the Shapiro-Wilk’s W statistic for each of 
the five groups of leases is determined as: 


W,(8) = {0.6052(16.269 — 13.521) + --- 
+ 0.0561(15.035 — 14.847)}*/7(0.842) = 0.982, 
W2(10) = {0.5739(16.292 — 12.597) + - -- 
+ 0.0399(14.430 — 14.307)}? /9(1.282) = 0.970, 
W3(5) = {0.6646(13.980 — 11.629) +--- 
+ 0.2413(13.273 — 12.134)}?/4(0.859) = 0.982, 
W4(12) = {0.5475(17.589 — 13.003) + --- 
+ 0.0922(15.539 — 15.370)}7/11(1.883) = 0.960, 
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Example 5 (continued ) 


TABLE 2.14 
Data on Log-bids of Five Texas Offshore Oil and 
Gas Leases 


Lease No. 


I i Hl IV Vv 


$ 16.269 $ 16.292 $ 13.980 $17.859 $17.188 
15.733 15.223 13.273 16.557 16.712 
15.256 15.100 12.616 16.264 16.259 
15.035 14.995 12.134 15.957 16.128 
14.847 14.430 11.629 15.910 15.463 
14.223 14.307 15.539 15.100 
13.987 13.520 15.370 14.565 
13.521 13.463 14.847 14.519 
13.129 14.785 13.521 

12.597 13.521 13.014 

13.503 13.003 

13.003 12.622 

12.530 


ny=8, S?=0.842, n2=10, SF =1.282, 


n3=5, S3=0.859, ng=12, S{=1.883, ns=13, S3=2.635. 


Source: Dyer and Keating (1980). Used with permission. 
and 


W5(13) = {0.5359(17.188 — 12.530) + --- 
+ 0.0539(15.100 — 14.519)}°/12(2.635) = 0.928. 


From Appendix Table XIX, the 5 percent critical values for the W statistic 
in each of the five groups are: 


W,(8, 0.05) = 0.818, W2(10, 0.05) = 0.842, 
W3(5, 0.05) = 0.762, W4(12, 0.05) = 0.859, 


W;(13, 0.05) = 0.866. 


Since in each group, W;(n;) > W;(n;, 0.05), the lognormality of the bids 
data is not rejected at the 5 percent significance level. In fact, it can be 
verified that the hypothesis of lognormality is sustained at a significance 
level of 0.5 or lower. 
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Example 5 (continued) 


We now test for homogeneity of variances using Bartlett’s test. For the 
log-bids data, the weighted arithmetic and geometric means of S?, T4 and 
Tg, are determined as 


5 
Yai — DS? 
_ t= 


r (7 x 0.842) + ---+ (12 x 2.635) 
A = = 


5 TH-+++12 


Yiu -— 1) 


i=] 


=1.7023 


5 
; i—D/Li-D 


TG = Ts) 


= (0.842)(7 72) . .. (2,635) =") 
= 1.5560. 
Hence, the Bartlett’s test statistic B is given by 

B =Tg/Ta = 1.5560/1.7023 = 0.9141. 


From Appendix Table XXI, the 5 percent critical value is approximately 
determined as 


B(8, 10, 5, 12, 13; 0.05) 


(8 10 5 
x (= )o.7512 ae ($5 ).0.8025 + (=; 0.5982) 


12 13 
i (=; ).0.8364 " (=; )0-8498 


= 0.7935. 


Since B > 0.7935, the hypothesis of homogeneity is not rejected at the 
5 percent significance level. As a matter of fact, it can be verified that 
the approximate 25 percent critical value is 0.8757 and the hypothesis of 
homogeneity is sustained at a significance level of 0.25 or lower. 


Hartley’s test 

Hartley (1950) developed a test for the hypothesis (2.21.12) when the sample 
sizes are all equal; that is, n; = n,i = 1,2,...,a. The test represents a natural 
extension to the F test for the case with a = 2. If the S?’s denote the sample 
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variances, then the test statistic is defined by 


es | (2.21.20) 
min (S?) ° _ 


where max(S?) and min(S?) denote the largest and smallest sample variances, 
respectively. Naturally, when the population variances are all equal, the value 
of H would be expected near 1 and greater the variation between S?’s, the 
larger the value of H. The decision rule consists of rejecting the null hypothesis 
(2.21.12) if the calculated value of H exceeds H[a, v; 1 —a], the 100(1 — @)th 
percentage point of the distribution of H. 


Remark: The distribution of the statistic (2.21.20) depends on a and n and initial 
tables for 1 and 5 percent critical values were originally given by Hartley (1950). 
Later, David (1952) gave tables fora = 0.05,0.01,a = 2(1)12, andv =n—-—1 = 
2(1)10, 12, 15, 20, 30, 60, co. These tables are also given in Owen (1962) and Pearson 
and Hartley (1970). Some selected percentage points of the H distribution are given in 
Appendix Table XXII. 


Example 6. To illustrate the procedure, we consider the data given in 
Table 2.3. The sample variances are as follows: 


S? = 406.8, S;=97.6, S;=80.8, and Sj =69.2 


Now, we have 


max (S?) = 406.8, min (S7) = 69.2, 


and the statistic (2.21.20) is given by 


406.8 
H = — = 5.88. 
69.2 


From Appendix Table XXII, we have H[4, 5;0.95] = 13.70 and so we do 
not reject the null hypothesis that the four variances are all equal. 


Cochran’s test 

Cochran (1941) developed a test for homoscedasticity especially designed for 
the case when one variance is very much larger than the others and the sample 
sizes are all equal. The test statistic 1s given by 


max (S?) 


ys 
1=] 


(2.21.21) 
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The decision rule consists of rejecting the null hypothesis (2.21.12) if the cal- 
culated value of C exceeds C[a, v; 1 — a], the 100(1 — a)th percentage point 
of the distribution of C. 


Remark: The distribution of the statistic (2.21.21) depends ona and n, and initial tables 
for the upper 5 percentage points fora = 3(1)10 and v = n—1 = 1(1)6(2)10 were given 
by Cochran (1941). Later, Eisenhart and Solomon (1947) gave tables fora = 0.05, 0.01, 
a = 2 (1) 12, 15, 20, 24, 30, 40, 60, 120, oo, and v = 1 (1) 10, 16, 36, 144, oo. These 
tables are also given in Pearson and Hartley (1973). A more comprehensive tabulation of 
the statistic C appears in a publication by Japanese Standards Association (1972). The 
latter publication presents the percentage points of C fora =2 (1) 20;n =2 (1) 31,41, 
61, 121, 00; anda = 0.05, 0.01. Some selected percentage points for the distribution of 
C are reprinted in Appendix Table XXIII. 


Example 7. To illustrate the procedure, we again consider the data of 
Table 2.3 as in the case of the Hartley’s test. The sample variances lead to 
the value of the test statistic (2.21.21) given by 


406.8 
= 0.6216. 


a 
406.8 + 97.6 + 80.8 + 69.2 


From Appendix Table XXIII, we have C[4, 5; 0.95] = 0.5895 and there- 
fore we reject the null hypothesis (2.21.12) that the variances are all equal. 
Note that for the same data Cochran’s test leads to the rejection of the null 
hypothesis whereas Hartley’s test fails to reach the critical value. 


Comments on Bartlett’s, Hartley’s and Cochran’s tests 
(1) In most practical situations, the Hartley’s and Cochran’s tests will lead to 
similar conclusions. Since Cochran’s test utilizes more information 1n the sam- 
ple data, it is generally more sensitive than Hartley’s test. When the normality 
assumption can be relied upon, Bartlett’s test is more powerful than other tests 
(Gartside (1972)). 

(11) Both Hartley’s and Cochran’s tests require that all sample sizes be equal. 
If the sample sizes are unequal, but do not differ greatly, they may still be 
used as approximate tests. In this case, the value of n would be the average 
sample size for the determination of the percentage points of the test statistics. 
Some statisticians recommend the use of the largest n for this purpose. The 
procedure will result in the probability of type I error being slightly larger than 
the prescribed value. 

(111) All the test procedures are sensitive to departures from normality (Box 
(1953); Box and Anderson (1955)). That is, if the populations from which 
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samples are taken are not normally distributed, the actual level of significance 
may differ greatly from the specified one. They all tend to mask existing differ- 
ences in variances if the kurtosis 1s smaller than zero, or to exhibit nonexistent 
differences if the kurtosis is greater than zero. Thus, the values of the test statis- 
tics may lead to an erroneous rejection of the null hypothesis. Therefore, it is not 
advisable to test for homoscedasticity unless there is sufficient evidence to as- 
sume that the distributions are at least approximately normal. It is recommended 
that any homoscedasticity test be used only when preceded by a preliminary 
test which does not reject normality. However, if the test is performed as a 
check before using an analysis of variance procedure, then the rejection of the 
null hypothesis indicates that at least one of the two underlying assumptions is 
violated. For example, it has been found that Bartlett’s test is a good one for 
testing departures from normality. 

(iv) As noted in the preceding section, the analysis of variance F test is 
not much affected by the unequal variances as long as the differences in the 
variances are not too large and the sample sizes are nearly equal. Hence, a 
fairly low level of a may be justified in conducting the test for the equality of 
variances when the sample sizes are nearly equal. This would be appropriate 
in determining the aptness of the analysis of variance model (2.1.1) since only 
large differences between variances need to be detected. 


Other tests of homoscedasticity 

The preceding tests of homoscedasticity are traditional tests based on normal 
theory for testing the null hypothesis of equal variances. However, they all 
are very sensitive to the assumption of normality and give too many signifi- 
cant results for data coming from a long-tailed distribution. In recent years, a 
number of tests have appeared in the literature that are less sensitive to nor- 
mality in the data and are found to have a good power for a variety of pop- 
ulation distributions. Levene (1960) proposed a test that considers the scores 
25 = OS y;.) as identically distributed normal variates and applies the usual 
F test on these scores. A significant difference between means of the trans- 
formed scores is considered as evidence of significant differences in variances 
of the groups. Levene (1960) also proposed using F tests based on the scores 
zij = Ivy — Yi, Zij = loge lyiy — ¥i.|, and zij = lyiy — i... 

Following Levene (1960), a number of other robust procedures have been 
proposed that are essentially based on techniques of applying analysis of vari- 
ance to transformed scores. For example, Brown and Forsythe (1974c) proposed 
using the transformed scores based on the absolute deviations from the median. 
In order to increase power when sample sizes are odd, Ramsey and Brailsford 
(1989) suggested that the median be replaced by the pseudo median equal to the 
midpoint of the scores just above and below the median. A somewhat different 
approach known as the jackknife was proposed by Miller (1968) where the orig- 
inal scores in each group were replaced by the contribution of that observation 
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to the group variance.””? O’Brien (1979, 1981) proposed a procedure that 1s 
a blend of Levene’s squared deviation scores and the jackknife. It performs 
analysis of variance using 


(n; — 1.5) nj(yij — 4)? — 0.5 57(n; — 1) 
(nj — 1)(n; — 2) 


where y;, and Se represent mean and variance, respectively, for the i-th factor 
level. 

In recent years, there have been a number of studies investigating the ro- 
bustness of these procedures and they point toward the robustness of Brown- 
Forsythe and O’Brien procedures. More recently, Algina et al. (1995) have 
proposed a procedure, called maximum test for scale, in which the test statistic 
is the more extreme of the Brown-Forsythe and O’Brien test statistics. Some 
limited simulation work for the two-sample case indicates better properties for 
type I and type IJ error rates than either the Brown-Forsythe or O’ Brien proce- 
dure. For further discussions and details, the reader is referred to Games et al. 
(1972), Hall (1972), Layard (1973), Levy (1978b), Keselman et al. (1979), 
Conover et al. (1981), Olejnik and Algina (1987), Micceri (1989), Ramsey 
(1994), and Algina et al. (1995). 


2.22 CORRECTIONS FOR DEPARTURES 
FROM ASSUMPTIONS OF THE MODEL 


If the data set in a given problem violates the assumptions of the analysis of 
variance model (2.1.1), a choice of possible corrective measures is available. 
One is to modify the model. However, this approach has the disadvantage that 
more often than not the modified model involves fairly complex analysis. An- 
other approach may be to consider using some nonparametric tests which do not 
make the normal theory assumption for inference problems. A third approach 
to be discussed in this section is to use transformations on the data. Sometimes 
it is possible to make an algebraic transformation of the data to make them 
appear more nearly normally distributed, or to make the variances of the error 
terms constant. Conclusions derived from the statistical analyses performed on 
the transformed data are also applicable to the original data. In this section, we 
briefly discuss some commonly used transformations to correct for the lack of 


27 The jackknife procedure computes sample variances within each group by deleting one 
observation at a time. Thus, in the i-th group, n; variances are computed as follows: 


] 7 l ni 
g aisiee mo AD “ 
[Or a4 ) (yij — Vie)” where jie = ee 
— y 
e i 


The analysis of variance is performed on the transformed scores zjg = n; log, (s?) — (nj; — 1) 
log. (874) (a0 (Pens 16 cae — sal (PnP ee n;) and the test statistic is the usual F statistic with 
a — 1 and N — a degrees of freedom. 
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normality and homoscedasticity. Tukey (1955) discussed the use of transforma- 
tions such that effects in the transformed scale are additive. Although individual 
transformations that will correct for lack of normality, homoscedasticity, and 
nonadditivity may be different, Box and Cox (1964, 1982) found that often a 
single transformation will simultaneously rectify all the problems. 


Remark: For further discussions of transformations, the reader may refer to Bartlett 
(1936, 1947), Cochran (1940), Curtiss (1943), Bartlett and Kendal (1946), Eisenhart 
(1947b), Freeman and Tukey (1950), Tukey (1957), Draper and Hunter (1969), Cox 
(1977), Draper and Smith (1981, pp. 220-221), Efron (1982), Berry (1987), and Hoaglin 
(1988). Natrella (1963, Chapter 20) provides a detailed and thorough discussion of the 
use of transformations. An extremely thorough and detailed monograph on transfor- 
mation methodology has been prepared by Th6ni (1967). An excellent and thorough 
introduction and a bibliography of the topic can be found in a review paper by Hoyle 
(1973). For a more recent bibliography of articles on transformations, see Draper and 
Smith (1981, pp. 683-684). 


TRANSFORMATIONS TO CORRECT LACK OF NORMALITY 


Here, we discuss some transformations to correct for the departures from nor- 
mality. 


Logarithmic transformation 
Suppose the data are distributed according to the relationship 


yij = Bla + ij), (2.22.1) 
where the e;;’s are normally and independently distributed, each with mean zero 


and variance a2. Then, on making a logarithmic transformation of (2.22.1), we 
get 


log, (yij) = log,(B) + a; + ei;, 
which can be rewritten as 
Yj = +a; + ej. (2.22.2) 
From (2.22.2), we notice that although the y;;’s are not normally distributed, 
the transformed variables y;,’s are. This may be the case when the distribution 


of yj;’s is Skewed. 


Square-root transformation 
Suppose the sample observations are given by the relationship 


yij =(U+Q; + eij)’, (2.22.3) 
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where, as in (2.22.1), the e;;’s are normally and independently distributed with 
mean zero and variance 02. Then, on making thé square-root transformation, 
we get 


Vij = JVij = UT Oj + ij. (2.22.4) 
From (2.22.4), we notice that although the y;;’s are not normally distributed, 


the transformed variables y; j 8 are. This may be the case when the yj; 's are 
nonnegative real numbers and their distribution is skewed to the right. 


Arcsine transformation 
Suppose the sample scores y;;’s are binomial proportions with mean jp based 
on samples of size n. Then, the transformed scores”® 


y;; = 2arcsin /yij (2.22.5) 
are approximately normally distributed with approximate mean yz’ = 2 arcsin 
/p and variance 1/n. The transformation (2.22.5) does not perform as well at 
the extreme ends of the possible values (near 0 and n). Anscombe (1948) and 


Freeman and Tukey (1950) proposed some improved arcsine transformations 
given by 


y;; = arcsin /(ny;; + 3/8)/(n + 3/4) 


] : nNyij : ny;; + 1 
‘= — | arcsin,/ — arcsin ,/ —~—— 
a ; | ry aes n+1 


TRANSFORMATIONS TO CORRECT LACK OF HOMOSCEDASTICITY 


and 


There are several types of data in which the variances of the error terms are not 
constant. If there is evidence of some systematic relationship between treatment 
mean and variance, homogeneity of the error variance may be achieved through 
an appropriate transformation of the data. Bartlett (1936) has given a formula 
for deriving such transformations provided the relationship between jz; and o? 
is known. In many cases where the nature of the relationship is not clear, the 
experimenter can, through trial and error, find a transformation that will stabi- 
lize the variance. We now consider some commonly employed transformations 
to stabilize the variance. 


28 If the data come from a population having the so-called negative binomial distribution, then 
the use of inverse hyperbolic sines may be more appropriate (Beall (1942); Bartlett (1947); 
Anscombe (1948)). 
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Logarithmic transformation 

This transformation is applicable when a? oe jee or 0; a j4;, that is, when the 
factor level standard deviation is proportional to the corresponding mean. This 
type of situation arises when the distribution of scores is markedly skewed. The 
transformation is also applicable when the scores are standard deviations. In 
this case s; /¥;. tends to be constant and so a logarithmic transformation, that is, 


yi; = log.(yij), (2.22.6) 


would stabilize the variance. If some of the measurements are small (particularly 
zero), the recommended transformation is (Bartlett 1947) 


y;; = log. (yij + 1). (2.22.7) 


Square-root transformation 

This transformation is applicable when 07 « 4;, that is, when the means and 
variances are proportional for each factor level. This type of situation is often 
found when the observed variable y;; is a count, such as the number of auto 
accidents in a given year. In this case, the sample statistic s7/¥;, tends to be 
constant and so a square-root transformation such as 


Vig = Vij (2.22.8) 


would stabilize the variance. If some of the observations y;;’s are very small 
(particularly zero), homogeneity of variance is more likely to be achieved by 
the transformation”? (Bartlett (1936)) 


Yiy = Vy +05. (2.22.9) 


The square-root transformation is usually applied to all data assumed to fol- 
low a Poisson distribution. For a discussion of the use of square-root trans- 
formation to perform analysis of variance for Poisson data, see Budescu and 
Applebaum (1981). 


Reciprocal transformation 

This type of transformation is applicable when o; « yA, that is, when the factor 
level standard deviation is proportional to the square of the corresponding mean. 
In this case s;/ y? tends to be constant and an appropriate transformation to 


*? The transformation y/; = /yij + 3/8 has an even better variance stabilizing property than 
equation (2.22.9) (Anscombe (1948); Kihlberg et al., (1972)). Freeman and Tukey (1950) showed 
that the transformation y; j= JS¥ij + Vyiz +1 will yield similar results as (2.22.9) but is 
preferable for y;; < 2. 
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stabilize the variance is the reciprocal transformation;*° that is, 
Vij = 1/Yij- (2.22.10) 


The transformation (2.22.10) is generally used when y; p= ij | has a definite 
physical meaning and where the possibility of the random variable being less 
than or equal to zero 1s negligible. For example, data on the failures of a machine 
may be collected as either “intervals between failures,” or “the number of fail- 
ures per unit time.” Similarly, the reciprocal of the survival data is related to the 
death rate and the reciprocal of the waiting time unit when some phenomenon 
occurs 1s related to the speed with which the phenomenon occurs. 


Arcsine transformation 

This transformation is applicable when a? x p;(1 — p;), that is, when scores 
are proportions. For example, the factor levels may be different treatment proce- 
dures, the unit of observation is a clinical center, and the observed variable y;; is 
the proportion of patients in the i-th treatment group for the j-th clinical center 
who benefited by the treatment. In this case an appropriate transformation to 
stabilize the variance is the arcsine transformation; that is, 


Y;; = arcsin ./Yjj- (2.22.11) 


The transformed score using (2.22.11) is the angle whose sine is equal to the 
square root of the original score. Tables to facilitate this transformation have 
been prepared (see, e.g., Fisher and Yates (1963); Owen (1962)). 


Square transformation 
If the standard deviation decreases as the corresponding factor level mean in- 
creases, then the transformation 


Yi = Yi; (2.22.12) 


would stablize the variance. The transformation (2.22.12) 1s generally useful 
when the distribution is skewed to the left. 


Power transformation 

When there does not exist a theoretical basis to select a transformation, or the 
transformations described fail to achieve normality or homoscedasticity, a class 
of transformations proposed by Box and Cox (1964) can be used to achieve the 
desired objective. The general form of the transformation is given by 


Yi A4#0 
fi) = (2.22.13) 


30 If y; j Tepresents counts, then y; j= 1/(yij + 1) may be used to avoid a possibility of division 
by zero. 
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where A is a parameter to be determined from the data. The analyst tries different 
values of A in (2.22.13) until the transformed scores conform to the assumption 
in question. It should be noted that the transformation (2.22.13) includes the 
following simple transformations as special cases. 


] 
A=-1, fO/)=— 
Yij 


A=-0.5, f(vij)= 


A=0, f(yij) =log.Qij) (by definition) 
A=05, fO) = Si 
A=2, f(y) = ¥z- 


These are some of the more commonly used transformations. Still other 
transformations can be found to be applicable for various other relationships 
between the means and variances. Furthermore, the transformations to stabilize 
the variance also often make the population distribution nearly normal. How- 
ever, the use of such transformations may often result in different group means. 
It is possible that the means of the original scores are equal but the means of 
the transformed scores are not, and vice versa. Moreover, the means of trans- 
formed scores are often changed in ways that are not intuitively meaningful or 
are difficult to interpret. 


EXERCISES 


1. In an effort to increase the service life of a handbrake, an automobile 
manufacturing company has developed three new designs. To assess 
their performance against a standard design of a handbrake, 12 auto- 
mobiles of acertain make were randomly chosen and assigned to four 
different groups with 3 cars in each group. The handbrakes of four 
different designs were then randomly assigned to each group with 
each of the 3 cars in every group using a handbrake of one of the 
four designs. The relevant data on service life, measured in months, 
for each handbrake are given as follows. 


Standard Design 21.2 13.4 17.0 
New Design-I 21.4 12.0 13.0 
New Design-II 3.2 9.1 4.2 
New Design-Ill 8.7 35.8 39.0 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test to determine if there are signifi- 
cant differences in the average service life in the four groups of 
handbrakes. Use a = 0.05. 
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(d) Carry out the test for homoscedasticity ata = 0.01 by employing 
(1) Bartlett’s test, 
(ii) Hartley’s test, 
(111) Cochran’s test. 


(e) Would you consider using appropriate contrasts? If so, perform 

the following, and interpret your results: 
(i) Orthogonal contrasts, 

(ii) Tukey’s procedure, 

(1) Scheffé’s procedure. 

(f) If itis found that the measures of service life for handbrakes have 
a distribution skewed to the night, what transformation would be 
appropriate to correct it? Make the required transformation on 
the data and repeat the analyses carried out in parts (b), (c), and 
(d). 

(g) Why were all the automobiles included in the experiment of a 
certain preselected model? Is it possible to generalize the results 
of this study to automobiles of any other model? 


2. A study was carried out to determine if different types of savings in- 


stitutions attract similar amounts of savings after adjusting for factors 
such as advertising, years in operation, and size of the neighborhoods 
of the branches, and so on. A research analyst randomly selected 5 
out of a large number of savings institutions included in the study 
and from each of these 5 institutions, 5 branches were selected at 
random. The total savings, in millions of dollars, in the 25 branches 
included in the study are given as follows. 


Types of Savings Institutions 
A B Cc D E 


37.2 334 37.5 31.0 30.9 
38.4 37.7 366 33.4 37.0 
36.0 388 35.8 36.7 36.2 
31.3 32.8 37.0 39.0 38.1 
32.4 33.7 35.6 37.1 36.8 


(a) Describe the mathematical model and the assumptions involved. 
Would you use Model I or Model II? Explain. 

(b) Analyze the data and report the analysis of variance table. 

(c) Determine if there is significant evidence to conclude that the av- 
erage accumulated savings are not the same among the different 
types of savings institutions under study. Use a = 0.01. 

(d) Would you consider using contrasts? Explain. 

3. Calculations of the sums of squares for a one-way analysis of variance 
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from certain experimental data yielded the following results: 


SS3 = 570.23, 

SSw = —19.72, 
and 

SS7 = 550.51. 


What can you say about the correctness of the results? Explain. 

4. A consumer organization carried out a study to determine whether 
the price being offered for a used car differed with the personality 
of the owner of the car. Four individuals were selected for the study, 
and each, pretending to be the owner, was sent to 5 different dealers. 
From each of the 20 dealers selected in the study the price quotes 
were obtained on a five-year old medium price car. The amounts 
offered, in hundreds of dollars, by each of the 20 dealers in the study 
are given as follows. 


Owners 
A B C D 


40 40 34 35 
38 43 37 40 
40 41 38 37 
41 42 40 36 
37 ©4300 «635 34 


(a) Describe the mathematical model and the assumptions involved. 
Would you consider using a Model I or Model II? Explain. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test to determine whether the price 
quotes differ according to the personality traits of the owner. 
Use a = 0.05. 

(d) What are the variance components associated with the assump- 
tions of the Model II? 

(e) Obtain appropriate point and interval estimates of the variance 
components identified in part (d). 

5. Out of three different textbooks published by three leading publish- 
ers, a Statistics professor is trying to choose one for adoption for his 
basic statistics class. He designed an experiment with 30 students of 
his class, whom he randomly assigned into three different groups, 
placing 10 in each group. The three textbooks, from John Wiley, 
Prentice-Hall, and Wadsworth, were then randomly assigned to each 
group. After the end of the course, all the students who completed 
the course took the same examination. The scores of the examination 
are given in the following. 
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Textbooks 


John Wiley Prentice-Hall Wadsworth 


80 55 65 
80 62 55 
8] 80 62 
71 70 67 
8] 70 58 
75 66 V2 
82 77 70 
78 75 60 
36 52 
84 


(a) Describe the mathematical model and the assumptions involved. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the average 
scores using three different textbooks are the same. Use a9 = 
0.05. 

(d) Carry out the test for homoscedasticity ata = 0.01 by employing 


(i) Bartlett’s test, 
(11) Hartley’s test, 


(111) Cochran’s test. 


. An automobile company wants to know the length of time during 


which the premiums are given by the different agents employed by the 
company. A study was conducted in which four agents were chosen 
at random and the number of transactions completed by each agent 
in a given week were recorded. The delay, in days, for completing 
the transaction was noted for each sample case and the relevant data 
are given as follows. 


Agents 

F i Hl IV 
9 19 21 22 
7 17 30 28 
9 22 32 23 
13 23 26 19 
10 28 29 20 
19 21 30 19 
17 21 21 24 
12 27 28 26 
6 25 33 27 
11 16 20 

10 35 


13 28 
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(a) Describe the mathematical model and the assumptions involved. 
Would you consider using Model I or Model II? Explain. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test to determine if the mean delay 
time varies from agent to agent. Use a = 0.01. 

(d) What are the variance components associated with the assump- 
tions of the Model II? 

(e) Obtain appropriate point and interval estimates of the variance 
components identified in part (d). 

7. Consider an experiment designed to investigate differences in 
blood counts in three groups of monkeys randomly administered 
to two drugs and a control. The data on blood counts are given as 
follows. 


Type of Drug 

A B Control 
11.8 14.8 9.4 
10.9 11.7 10.5 
9.7 14.2 92 
11.4 11.2 10.2 
12.6 11.8 
10.3 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test to determine if the average blood 
count varies for three types of drugs. Use a = 0.05. 

(d) Carry out the test for homoscedasticity ata = 0.05 by employing 


(i) Bartlett’s test, 
(11) Hartley’s test, 
(111) Cochran’s test. 


(e) Determine 95% one-sided and two-sided confidence intervals 
for the single contrast comparing control with the mean of the 
other two drugs using the Dunnett’s statistic and interpret your 
results. 

8. A zoologist studying the structural traits of certain species of mam- 
mals classifies them into three groups: small, medium, or large ac- 
cording to the size of the vertebrate. He selects three random samples 
of size 8 from each group and then records the length of each in the 
sample. The relevant data on length measurements in certain standard 
units are given as follows. | 
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Mammal Groups 


Small Medium — Large 


8.1 11.4 8.6 
8.8 11.2 7.1 
10.5 10.6 74 
‘he 7.7 9.0 
9.6 9.5 8.6 
9.8 8.1 9.1 
10.1 9.5 10.3 
12 12.1 9.5 


(a) Describe the mathematical model and the assumptions involved. 

(b) Analyze the data and report the appropriate analysis of vari- 
ance table 

(c) Perform an appropriate F test for the hypothesis that the mean 
length of each group is the same. Use a = 0.01. 

(d) Carry out the test for homoscedasticity ata = 0.01 by employing 


(i) Bartlett’s test, 
(1) Hartley’s test, 
(111) Cochran’s test. 


(e) Would you consider using contrasts? If so, perform the following 
and interpret your results 


(1) Orthogonal contrasts, 
(11) Tukey’s procedure, 
(111) Scheffé’s procedure. 


9. Consider the null hypothesis Hp:a@, = a2 = a3 = a4 = O versus 
the alternative H,: not all a;’s are zero. 

(a) Determine three orthogonal contrasts. 

(b) Are the three orthogonal contrasts given in part (a) unique; that is, 
can you construct two or more separate sets of three orthogonal 
contrasts? 

(c) Can you construct four orthogonal contrasts? 

10. A manufacturing company employs a large number of presses that 
are used to produce certain automobile parts. A study was conducted 
to assess the performance of the presses. A sample of four presses 
was selected at random from the entire plant and then 10 parts were 
taken at random from the production line of each press. The measures 
on the length of the 40 parts were determined and the calculations 
on the sums of squares yielded the following results: 


SSz = 0.0264 and SS, = 0.0380 


(a) Describe the mathematical model and the assumptions involved. 
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(b) Prepare the pertinent analysis of variance table. 

(c) Perform an appropriate F test to determine if the presses are 
similar in their average performance. Use a = 0.05. 

(d) Obtain appropriate point and interval estimates of the variance 
components associated with the model assumed in part (a). 

(e) Test the hypothesis that the between and within component ratio 
is equal to or less than 1/2. Use a = 0.05. 

(f) Find an interval estimate for the between and within compo- 
nent ratio and the intraclass correlation using the confidence 
coefficient of 0.95. 

A study was performed to determine the effect of different varieties of 

fertilizers upon potato yields. Three fertilizers, designated by N, P, 

and K, were used. Each fertilizer was randomly assigned to 10 plots 

and the yields were determined for each of the 30 plots. The yield 
totals corresponding to the three fertilizer groups are 


Yn = 50, Yp = 70, Y; = 100; 


and the total sum of squares is calculated to be 580. 

(a) Describe the mathematical model and the assumptions involved. 

(b) Prepare the pertinent analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the average 
yield for each fertilizer group is the same. Use aw = 0.01. 

(d) Consider the following hypothesis of interest: the average for 
N equals the average of the other two fertilizers. Carry out the 
corresponding test of the hypothesis and state your conclusions. 
Use a = 0.01. 

In a study involving health and nutrition survey, 15 families each 

spending comparable amount in their grocery bills were administered 

a survey questionnaire regarding their dietary habits. The families 

were classified according to whether they lived in a rural, urban or 

suburban district and the data on average daily protein consumption 
are given as follows. 


District 


Urban Suburban Rural 


371 365 491 
334 352 421 
358 362 44] 
300 321 461 
343 342 

302 


(a) Describe the mathematical model and the assumptions involved. 
(b) Analyze the data and report the analysis of variance table. 
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(c) Perform an appropriate F test to determine if the average daily 
protein consumptions are equal for the three districts. Use a = 


0.05. 
(d) Carry out the test for homoscedasticity ata = 0.05 by employing 


(i) Bartlett’s test, 
(ii) Hartley’s test, 
(111) Cochran’s test. 


(ec) Would you consider using contrasts? Ifso, perform the following 
and interpret your results: 


(i) Orthogonal contrasts, 
(11) Tukey’s procedure, 
(11) Scheffé’s procedure. 


A study was conducted to study the relationship between intelligence 
and ability to concentrate. Thirty students were randomly selected 
from a large psychology class and were administered tests of intel- 
ligence and concentration ability. The students were classified into 
five groups according to their concentration ability and the data on 
IQ score are given as follows. 


Concentration Ability 


I il Hl IV Vv 


121 115 130 74 96 
129 132 118 105 94 
140 114 132 106 88 

118 106 104 

123 116 97 

111 99 103 

121 108 

113 96 


111 
121 


(a) Describe the mathematical model and the assumptions involved. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test to determine if the average IQ 
scores are equal for the five groups. Use a = 0.05. 

(d) Carry out the test for homoscedasticity ata = 0.05 by employing 


(i) Bartlett’s test, 
(11) Hartley’s test, 
(111) Cochran’s test. 


(e) Would you consider using contrasts? If so, perform the following 
and interpret your results: 


(1) Orthogonal contrasts, 
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(ii) Tukey’s procedure, 
(ii) Scheffé’s procedure. 


14. Hendy and Charles (1970) reported data on the silver content (% Ag) 
of a number of Byzantine coins discovered in Cyprus. There were 
nine coins from the first coinage of the reign of King Manuel I, 
Commenus (1143-1180); seven of the coins came from the second 
coinage minted several years later and four from the third coinage 
(still later); another seven were from a fourth coinage. The question 
of interest is whether there were significant differences in the silver 
content of coins minted early and late in King Manuel’s reign. The 
data are given as follow. 


First Coinage Second Coinage Third Coinage Fourth Coinage 


5.9 6.9 49 5.3 
6.8 9.0 5.5 5.6 
6.4 6.6 4.6 5.5 
7.0 8.1 4.5 5.1 
6.6 9.3 6.2 
7.7 9:2 5.8 
29: 8.6 5.8 
6.9 

6.2 


Source: Hendy and Charles (1970). Used with permission. 


(a) Describe the mathematical model and the assumptions involved. 
(b) Analyze the data and report the analysis of variance table. 
(c) Perform an appropriate F test to determine if the average silver 
contents are equal for the four coinage. Use a = 0.05. 
(d) Carry out the test for homoscedasticity ata = 0.05 by employing 
(i) Bartlett’s test, 
(1) Hartley’s test, 
(111) Cochran’s test. 


(e) Would you consider using contrasts? If so, perform the following 
and interpret your results: 


(1) Orthogonal contrasts, 
(1) Tukey’s procedure, 
(ii) Scheffé’s procedure. 


15. Anionwu et al. (1981) reported data on steady-state haemoglobin 
levels for patients with different types of sickle cell disease. The 
question of interest is whether the steady-state haemoglobin levels 
differ significantly between patients with different types. The date 

are given as follows. 
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Type of Sickle Cell Disease 


HB SS HB S/-thalassaemia HB SC 


7.2 8.1 10.7 
TS 9.2 [3 
8.0 10.0 11.5 
8.1 10.4 11.6 
8.3 10.6 11.7 
8.4 10.9 11.8 
8.4 11.1 12.0 
8.5 11.9 12.1 
8.6 12.0 12.3 
8.7 12.1 12.6 
9.1 12.6 
9.1 13.3 
9.1 13.3 
9.8 13.8 
10.1 13.9 
10.3 


Source: Anionwu et al. (1981). Used with permission. 


(a) Describe the mathematical model and the assumptions involved. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test to determine if the average levels 
of haemoglobin are equal for the three types of sickle cell disease. 
Use a = 0.05. 

(d) Carry out the test for homoscedasticity ata = 0.05 by employing 


(i) Bartlett’s test, 
(11) Hartley’s test, 
(11) Cochran’s test. 
(e) Would you consider using contrasts? If so, perform the following 
and interpret your results: 
(i) Orthogonal contrasts, 
(11) Tukey’s procedure, 
(iii) Scheffé’s procedure. 
Sokal and Rohlf (1994, p. 237) reported data on the number of eggs 
laid per female per day for the first 14 days of life (per diem fecundity) 
for 25 females of each of three genetic lines of the fruitfly Drosophila 
melanogaster. The genetic lines to be labelled RS and SS were selec- 
tively bred for resistance and the susceptibility to DDT, respectively, 
and the line NS is a nonselected control strain. The purpose of the 


study was to investigate whether the two selected lines (RS and SS) 
differ in fecundity from the nonselected line; and whether the RS line 
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differs in fecundity from the SS line. The data are given as follows. 


Genetic Lines 


Resistant (RS) Susceptible (SS) Nonselected (NS) 
12.8 38.4 35.4 
21.6 32.9 27.4 
14.8 48.5 19.3 
23.1 20.9 41.8 
34.6 11.6 20.3 
19.7 22.3 37.6 
22.6 30.2 36.9 
29.6 33.4 37.3 
16.4 26.7 28.2 
20.3 39.0 23.4 
29:3 12.8 33.7 
14.9 14.6 29.2 
27.3 12.2 41.7 
22.4 23.1 22.6 
27.5 29.4 40.4 
20.3 16.0 34.4 
38.7 20.1 30.4 
26.4 23.3 14.9 
23:4 229 51.8 
26.1 22.9 33.8 
29.5 15.1 37.9 
38.6 31.0 29.5 
44.4 16.9 42.4 
23.2 16.1 36.6 
23.6 10.8 47.4 


Source: Sokal and Rohlf (1994, p. 237). Used with permission. 


(a) Describe the mathematical model and the assumptions involved. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test to determine if the average fecun- 
dity levels are equal for the three genetic lines. Use a = 0.05. 

(d) Carry out the test for homoscedasticity ata = 0.05 by employing 


(1) Bartlett’s test, 
(ii) Hartley’s test, 
(iu) Cochran’s test. 


(e) Would you consider using contrasts? If so, test the follow- 
ing contrasts and interpret your results: (1) RS + SS — 2NS 
(i) RS — SS. 


Two-Way Crossed 
Classification Without 
Interaction 


3.0 PREVIEW 


The major advantage of the one-way classification (one-factor design) discussed 
in the preceding chapter is its simplicity, which extends to the experimental 
layout, the model and assumptions underlying the analysis of variance, and the 
computations involved in the analysis. The major disadvantage of such a design 
is its relative inefficiency. The error variance will usually be large compared to 
that resulting from other designs. This is in part offset by the fact that no other 
design yields as many degrees of freedom for the error variance as does this 
design. 

In many investigations, however, it is desirable to measure response at com- 
binations of levels of two or more factors considered simultaneously. For ex- 
ample, we might desire to investigate blood pressure for different gender and 
ethnic groups, or to investigate weight loss comparing four diets among urban, 
suburban, and rural subjects according to their gender, or to investigate miles 
per gallon among five makes of automobiles for both city and country driv- 
ing. In investigations involving many factors, the effect of each factor on the 
response variable may be analyzed using one-way classification. Such an anal- 
ysis, however, will not be economical or efficient with respect to time, effort, 
and money. Moreover, such a procedure would give no information about the 
possible interactions that may exist among different factors. 

The theory of analysis of variance permits the investigation of several fac- 
tors or independent variables within the same experiment. Such a procedure 1s 
efficient, time saving, and, equally important, it permits the investigation of the 
joint effects of several factors or interactions between them. This and the next 
chapter deal with the statistical model and analysis of variance involving two 
factors such that every level of one factor included in the experiment occurs 
with every level of the second factor and vice versa. Such a layout is termed 
two-way crossed classification. Two-way crossed layout allows a researcher to 
examine fully the main effects of both factors and their interactions. The term 
crossed classification or classification comes from the fact that in many fields of 
investigation, the measurements or observations can be classified in the form of 
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TABLE 3.1 
Data for a Two-Way Experimental Layout 
Factor B 
No B) By... By... By 
Ay 70 2 2 0 PT 
A2 yor Y22 wes OF wes YH 
Factor A 
Aj Yil Yi2 Yij Yib 
Aa Yat Ya2 +++ Yaj +++ Yab 


a two-way table where the rows of the table correspond to the levels of a factor 
and the columns to the levels of another factor. In a crossed classification it is 
customary to refer to the combinations of levels of the factors as cells rather 
than treatments. 


3.1 MATHEMATICAL MODEL 


Two factors are said to be crossed if the data contain observations at each 
combination of a level of one factor with a level of the other factor. Consider 
two factors A and B having a and b levels, respectively, and let there be exactly 
one observation in each of the a x b cells of the two-way layout. Let y;; be the 
observed score corresponding to the i-th level of factor A and the j-th level of 
factor B. The data involving the total of N = a x b scores y;;’s can then be 
schematically represented as in Table 3.1. 

The analysis of variance model for this type of experimental layout is given 
as 


yj=U+at+Pyte; G=1,2,...,a; j=l,...,b), (3.1.1) 


where —0o < 4 < OO is the overall mean, a; 1s the effect due to the i-th level 
of the factor A, B; is the effect due to the j-th level of the factor B, and e;; is 
the error term that takes into account the random variation within a particular 
cell. The model (3.1.1) states that the observed score yj;; corresponding to the 
(i, j)-th cell consists of the sum of the components: (1) the grand mean yp, 
(ii) the effect a; associated with the i-th level of factor A, (111) the effect B; 
associated with the j-th level of factor B, and (iv) an error term e;; which is 
strictly peculiar to the (7, 7)-th cell. 
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3.2 ASSUMPTIONS OF THE MODEL 


Similar to the one-way classification model (2.1.1), the following assumptions 
are made in order to make inferences about the existence of effects in model 


(3.1.1). 


(1) 
(11) 


(iii) 


(iv) 


(v) 


The errors e;;’s are assumed to be randomly distributed with mean zero 
and common variance o7. 
The errors associated with any pair of observations are assumed to be 


uncorrelated; that is, 


0, fi, 4s’: 
(a 0 Pe ey 

E(ejjeij) = 0. i a ; Z - (3.2.1) 
oe ist, fas’ 


Under Model I, the effects a;’s and B ;’s are assumed to be fixed constants 
subject to the constraints 


a b 
>) a; = > B; =.(), 
I=] j=] 


This implies that the observations yj;;’s are distributed with mean pz + 
a; + B; and common variance o?. 

Under Model II, the effects a;’s and B;’s are assumed to be randomly 
distributed with zero means and variances o? and Op» respectively. Fur- 
thermore, @;’s, B;’s, and e;;’s are mutually and completely uncorrelated; 


that is, in addition to (3.2.1), the following relations hold 


E(qja;) =0, i Av; 
E(B;bj)) =0, JAW 
E(@,B;) = 0, all (i, 7)’s; 
E(aje;;) = 0, all(, j)’s: Cle) 
and 


E(Bjeij) = 0, all(i, j)’s. 


Then, from the model equation (3.1.1), we have 0? = 02 + Of + 02; 


and thus o2, Of, and o? are components of o, the variance of an 
observation. This implies that the observations yj;;’s are distributed with 
mean jz and common variance o2 + o% +o2. 

Under Model III, the effects a;’s are assumed to be fixed subject to the 
constraint )°;_, a@; = Oand the effects B;’s are assumed to be randomly 
distributed with mean zero and variance oR. Furthermore, as before, the 


B;’s are uncorrelated with each other and each of the B;’s and e;;’s are 
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also uncorrelated; that is, 


E(6;B,)=0, j # i’; 
and (3.2.3) 
E(B ;éi;) = 0, all (i, J)’Ss. 


In this case, = = 02 * + 07 and so oF and o? are components of oy 
ane the pu js are distributed with mean + a; and common arance 
Of + oa? ; 


Remark: Under the assumptions of the random effects model in (3.1.1), the obser- 
vations se the same level of the factor A have the correlation given by py = 

o;/(o; +03 +.0;). Similarly, the eos veiols within the same level of factor B have 
the ee ee given by pg = O; / (a? + oO; + a”). Under the assumptions of the mixed 
model in (3.1.1), the a camion within ae same level of the random factor B have 
the correlation given by pg = 03/(o; + 03). These correlations are referred to as the 
intraclass correlations. 


3.3. PARTITION OF THE TOTAL SUM OF SQUARES 


The total variation or total sum of squares with respect to model (3.1.1) is 
ee ea (yij — ee which can be partitioned as follows: 


a b a b 
(ii —9.P => WIG: -II+ 05 - 5) 
i=1 j=l i=1 j=l 
+ (yj — H.-F +H 
a b 
= Y2 i. y+ » Ye, 
=o 


b 
+> > 0-H. -— FG FI 


i=l j=1 


P b 
=b a (¥.-y. +a >> (Vj — 
i=1 j=l 


a b 

+2 Oy -— 5. — FF + 5." (3.3.1) 
i=] j=l 

where 
b a 
DS yij ye Yij 
c= = i=l 
Yi. = ‘ yj = : 
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and 


The identity (3.3.1) is valid since all the cross-product terms are equal to zero. 
The first two sums of squares to the right of (3.3.1) measure the variation 
due to the a;’s and 6;’s, respectively, and the last one corresponds to the error 
e;; Ss. We use the notation SS,, SSz, and SS¢= to denote the sums of squares 
due to the a;’s, B;’s and e;;’s, respectively. The corresponding mean squares, 
obtained by dividing SS,4, SSzg, and SSz by (a— 1), (b—1), and (a— 1) (b— 1), 
respectively, are denoted by MS,4, MSz, and MSz,, respectively. Here, (a — 1), 
(b — 1), and (a — 1) (b — 1) are obtained by partitioning the total degrees of 
freedom ab — | into three components: due to the @;’s, B;’s, and e;;’s 


Remark: The three cross-product terms arising in equation (3.3.1) are: 


a b 
255 Gi. — FIG - 5.) =2 6 — ¥.. 305-3 =. 
i=l j=l 


= al. 5.66. aa 
— 0; 


a 


b 
pe 2, Gi — VMVF — Vi. — YG + Yd 
=206 =. ou-3 ey 


= 29751. — 5.66: i. —F.+5.) 


i=l 


and 


b 
29° 265 - FO — H.-F HH) 


i=l 


b 
=2) 65-5. Nou-3 ye EY) 


Se 
il 
— 


b 
= 29°65 - 9.05 — 9. - H+.) 


Thus, all cross-product terms are equal to zero. 


130 The Analysis of Variance 


3.4 MEAN SQUARES AND THEIR EXPECTATIONS 


Next, we examine the expectations of the mean squares. On taking successive 
averages of the model equation (3.1.1), we obtain 


y= uUtat+p +&, (3.4.1) 

yj =ew+a +B +é,, (3.4.2) 
and 

y =pta +f +é. (3.4.3) 


Substituting the values of y;;, ¥., yj, and y,, from (3.1.1), (3.4.1), (3.4.2), and 
(3.4.3), respectively, into the expressions for SS,4,SSzg, and SS¢e defined in 
(3.3.1), we find that 


a b 
SSe=)° > (ij — 6. -—8; +2), (3.4.4) 
=). 7=1 
b ‘eat 
SSp =a) (6; -—B +2; -2.), (3.4.5) 
j=l 
and 
SS, =b >» (a; -& +é —-2@Y. (3.4.6) 
i=l 


Now, because the e;;’s are uncorrelated and identically distributed with mean 
zero and variance @?, it follows that 


E(e;,) = 0;, (3.4.7) 

E(é) =o; /b, (3.4.8) 

E(@,) =o; /a, (3.4.9) 
and 

E(é*) = 0; /ab. (3.4.10) 


It is then a matter of straightforward computations to derive the expectations of 
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mean squares. First, taking the expectation of (3.4.4.), we obtain 


a b 
2 
E(SSz)= )_ > Ee — %, — 2; +2.) 

i=1 j=l 


a b 
=) Ej) + £@) + £@%) + ER 
i=l j=1 


= 2E(e;;éi.) = 2E(e;é.;) =F 2E(e;;é..) 
+ 2E(@@;) — 2E@.é..) — 2E@ ;é@.)| 


] ] 
= abo? — 5 oe — —o? + | 
1 1 ] 
2 
a b)j1----+— 
a0 | boa ~ || 
= (ab—a—b+1)o? 
= (a — 1)(b— 1)o?. (3.4.11) 


The expectation of MSz, is, therefore, given by 


E(MSe) = £| = —| = 2 (3.4.12 
ens aay hij ee 


Note that the result (3.4.12) 1s true under the assumptions of fixed and random 
as well as mixed effects models. 

Now, to derive the expectations of MSz and MS,z, we consider the cases of 
Models I, II, and III separately. 


MODEL I (FIXED EFFECTS) 


Under Model I, the @;’s and £;’s are fixed quantities depending on the particular 
levels included in the experiment with the restriction that@ = B = 0. First, 
on taking the expectation of (3.4.5), we obtain 


b b 
E(SS) =a bs BR+EY (ej; - | (3.4.13) 
j=l 


j=l 
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by virtue of the fact that the B;’s are constant and the expectation of the cross- 
product term is zero. Now, using the results (3.4.9) and (3.4.10), we find that 


b 
mae (é.; -—@.) - Ee ) — bE(@?) 


y= 


= O°. (3.4.14) 
Furthermore, on substituting (3.4.14) into (3.4.13), we obtain 


b 
E(SSp)=a)_ Bi + (b— lop. (3.4.15) 


j=l 


Therefore, the expectation of MSz is given by 


E(MSz) = E( SSB “\=5 — 


(3.4.16) 


b— 


Similarly, from symmetry, it follows that the expectation of MS, is given by 


b = 2 Z 
E(MS,a) = ea OL +o. (3.4.17) 


MODEL II (RANDOM EFFECTS) 


Under Model II, the a;’s and £;’s are also randomly distributed with mean zero 
and variances o2 and Op, respectively. It then follows, using the formulae for 
the variances of the sampling distribution of the means of the @;’s and 6;’s, that 


E(a7) =o, (3.4.18) 

E(@) =o; /a, (3.4.19) 

E(B;) = 95, (3.4.20) 
and 

E(B’) =o} /b. (3.4.21) 


Now, on taking the expectation of (3.4.5), we get 


b b 
E(SSg) =a E \ (Bj -BY +E Ej - “| (3.4.22) 
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since the expectation of the cross-product term is zero. Furthermore, using the 
results (3.4.20) and (3.4.21), we find that 


b b 
ES (8; — BY = 9~ E(B}) — bE(B’) 
j=l j=l 


| 
a 2 2 
_ bo, — ee: 
= (b— l) og. (3.4.23) 


On substituting (3.4.14) and (3.4.23) into (3.4.22), we obtain 
E(SSg) =a C —1l)og + (6-1) 0 
= a(b — 1)og + (b — 1)op. (3.4.24) 
Therefore, the expectation of MSz is given by 
E(MS3) = e(—* = aa, +0). (3.4.25) 
Similarly, from symmetry, it follows that the expectation of MS, is given by 


SS 
E(MS,) = E( e 
a —' 


| = bo? +07. (3.4.26) 


MODEL II! (MIXED EFFECTS) 


Under Model III, the w;’s are fixed quantities and the 8;’s are randomly dis- 
tributed with mean zero and variance o7. Furthermore, the B;’s are uncorrelated 
with each other and each of the 8 ;’s and e;;’s are also uncorrelated. Therefore, 
using the results (3.4.17) and (3.4.25), it follows that the expectations of MS, 
and MS, are given by 


E(MSg) = aog +0; 


and 
E(MS,) = = Sa? he, 
C= | a I e 


The foregoing results of Sections 3.3 and 3.4 can now be summarized in a 
tabular form as the analysis of variance table shown in Table 3.2. 


TABLE 3.2 
Analysis of Variance for Model (3.1.1) 


Expected Mean Square 


Source of Degrees of Sum of Mean 
Variation Freedom Squares Square Model I Model II Model III 
b b 
Due to A a—| SS4 MS, of + a a? a, + bo2 of + Aa a? 
a7" i=l @~ * j= 
a 
Due to B b—1 SSB MSp ae + bo 1 26; oa? + ao%, oa? + aos, 
j= 
Error (a—1)(b-1) SSE MS_ ae oa? oa? 
Total ab — | SSr 


VEL 


aouene jo siskjeuy ayy 
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3.5 SAMPLING DISTRIBUTION OF MEAN SQUARES 


It is important to recognize that in the derivations of the results on expected mean 
squares given in the preceding section, we have not made any distribution as- 
sumptions for e;;’s under Model I; for a;’s, B;’s, and e;;’s under Model II; and for 
B;’sand e;;’s under Model III. However, to derive the form of their sampling dis- 
tributions, we require the assumption of normality for the random components 
of model (3.1.1). Thus, under Model I, we assume that the e;;’s are independent 
and normal random variables with mean zero and variance o2. Under Model 
II, all a;’s, B;’s, and e;;’s are mutually and completely independent normal 
random variables with mean zero and variances a2, oR, and Or: respectively. 
Finally, under Model III, the a;’s are constants subject to the restriction that 
>-;-1 & = 0, and the B;’s and e;;’s are mutually and completely independent 
normal random variables with mean zero and variances o? and o2, respectively. 

In the following we give the results on sampling distributions of mean squares 
for fixed, random, and mixed effects models. The derivation of these results is 
beyond the scope of this volume and can be found in Scheffé (1959), Graybill 
(1961), and Searle (1971b). 


MODEL | (FIXED EFFECTS) 
Under the distribution assumptions of Model I, it can be shown that: 


(a) The quantities MS;, MSz, and MS, are statistically independent. 
(b) The following results are true: 


(1) 
20 — = 
— _ xXta—)O DI (3.5.1) 
o: (a — 1)(b—1) 
(ii) 
MSp x~[b—1,Agz] 
5 = (3.5.2) 
and 
(iii) 
2/ _ 
Bea, (3.5.3) 


2 = 
0; a—l 


where, as usual, x7[.] denotes a central and y~[. , .] denotes a noncentral chi- 
square variable with respective degrees of freedom, and the noncentrality para- 
meters Ag and A, are defined by 


a 2 
he = 55 DB: 


e j=l 
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and 


Di tx 7 
AA= a. 
: 202 oe ; 


It, therefore, follows from (3.5.2) and (3.5.3) that 
(ii)’ If 8; = 0, forall 7, then 


MS, x7[b—1] 


3.5.4 
a? b—1 ( ) 
aii)’ Ife; = 0, for alli, then 
MS *la—1 
a Ale 1h) (3.5.5) 


oa? a—l| 
MODEL II (RANDOM EFFECTS) 
Under the distribution assumptions of Model II, it can be shown that: 


(a) The quantities MS-, MSz, and MS, are statistically independent. 
(b) The following results are true: 


(1) 
MSe  x’[(a—1)(6- 1)] 

gamer oon ae ee (3.5.6) 

(11) 
MSz x*[b- 1] 
and 
(111) 
217 — 
BAe (3.5.8) 


o2+ be? a-—1 


That is, the ratio of MS¢ to a? is a x*[(a — 1)(b — 1)] variable divided by 
(a — 1)(b — 1), the ratio of MSz to oa? + ao; is a x7[b — 1] variable divided 
by b — 1, and the ratio of MS, to a; + bo? is a x7[a — 1] variable divided 
bya —l. 


MODEL III (MIXED EFFECTS) 
Under the distribution assumptions of Model III, it can be shown that: 


(a) The quantities MS;, MSz, and MS, are statistically independent. 
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(b) The following results are true: 


(i) 
MSz — x2[(a-DO- I) 
oa? (a—1)(b—-1) ’ 
(11) 
MS, x’*[b — 1] 
oe + aoz (b—1) ’ 
and 
(111) 
MS, x”[a—1,Aa] 
o? a—1l 
where 
b a 
i= a 
‘a 2a? = “ 


It follows from (3.5.11) that if w; = 0, for all i, then 
(111) 


MS, x7 [a — 1] 
a? on 
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OF VARIANCE F TESTS 
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(3.5.9) 


(3.5.10) 


(3.5.11) 


(3.5.12) 


In this section, we present the usual hypotheses about the effects of A and B 
factors and the appropriate F tests for fixed, random, and mixed effects models. 


MODEL | (FIXED EFFECTS) 
Under Model I, the usual hypotheses of interests are: 
Hy :all B;’s =0 
versus 


B e ? ° 
H, : not all B;’s are zero; 
and 


Hp : all a;’s = 0 
versus 
H@ : not all «;’s are zero. 


(3.6.1) 


(3.6.2) 


138 The Analysis of Variance 


In order to develop test procedures for the hypotheses (3.6.1) and (3.6.2), we 
note from (3.4.12), (3.4.16), and (3.4.17) that when the null hypotheses H} 
and H;" are true, we have 


E(MS_) = o;, 

E(MS3) = oy, 
and 

E(MS,) = 02, 


that is, MSz, MSz, and MS, are unbiased estimates of the same quantity a2. 
It then follows, from (3.5.1), (3.5.4), and (3.5.5), that 


MS; /o? - MSz3 


ose ie ~ F[b—1,(a—1)(b—1)] (3.6.3) 


FR 


and 


MS,/o;  MSa 


ee Wigeeh ise Ot (3.6.4) 


Therefore, the statistics (3.6.3) and (3.6.4) provide suitable test procedures 
for testing hypotheses (3.6.1) and (3.6.2), respectively. Thus, H;? is rejected at 
the a-level of significance if 


Fp > F[b-—1,(a—1)(b—-1);1-—a]. (3.6.5) 
Similarly, H¢ is rejected at the a-level of significance if 
F,a > Fla—-1,(a—1)(b—-1);1—-a]. (3.6.6) 


It should be noted, however, that when the null hypotheses Hy and Hy} are not 
true, it follows from (3.5.1) through (3.5.3) that 


MS=z 


Tl F'[b — 1, (a — 1)\(b — 1); Ag] (3.6.7) 
and 

Le 2 —~1\(b—1):A 3.6.8 

MSz a (a )¢ ); Al, ( VU. ) 


where F’ [., .; .] denotes a statistic having a noncentral F distribution with res- 
pective degrees of freedom and the noncentrality parameters Ag and 4 defined 
by 


AR = . B° 
== 
2 J 
20; jal 
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and 


ha = o25 oa? 
= as. 
“20 = 


The results (3.6.7) and (3.6.8) are employed in evaluating the power of these F 
tests in Section 3.11. 


Sn 


MODEL II (RANDOM EFFECTS) 


Under Model II, testing significance of the effects of a factor is equivalent to 
testing the hypothesis that the corresponding variance component is zero. Thus, 
the usual analogues of hypotheses (3.6.1) and (3.6.2) are: 


Hy [Op =0 versus H? [Op > 0 (3.6.9) 
and 
Hj':02 =0 versus Hi’ :02>0. (3.6.10) 


It can be readily seen that the statistics (3.6.3) and (3.6.4), obtained in the case 
of Model I, also provide suitable test procedures for the hypotheses (3.6.9) and 
(3.6.10), respectively. However, under the alternative hypotheses, the aforesaid 
Statistics have a (central) F distribution rather than a noncentral F as in the 
case of Model I, a fact which greatly simplifies the computation of power under 
Model II. 


MODEL III (MIXED EFFECTS) 


Under Model III, the hypotheses of interest are: Op = 0 anda;’s = 0. Again, it 
can be seen that the test statistics (3.6.3) and (3.6.4) developed earlier are also 
applicable for these hypotheses. 


3.7 POINT ESTIMATION 


In this section, we present results on point estimation for parameters of interest 
under fixed, random, and mixed effects models. 


MODEL I (FIXED EFFECTS) 


In the case of the fixed effects model, the least squares estimators! of the para- 
Meters j, a;’s, and 6;’s are obtained by minimizing the residual sum of squares 


a b 
0=)) > Oy -H-% - BY, (3.7.1) 
i=1 j=1 


! The least squares estimators in this case are the same as those obtained by the maximum likelihood 
method under the assumption of normality. 
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with respect to 4, @;’s, and 6;’s; and subject to the restrictions: 


a b 
yo a; = \ > Bj = 0. (3.7.2) 
i=1 j=l 


The resultant estimators are obtained to be: 


fii=y., (3.7.3) 

Gi = Yi. — Y., [ee ere 7 (3.7.4) 
and 

Bj =9j;-¥.. f=1,2,...,6. (3.7.5) 


These are the so-called best linear unbiased estimators (BLUE). The variances 
of the estimators (3.7.3) through (3.7.5) are: 


Var( jt) = a; /ab, (3.7.6) 

Var(@;) = (a — 1) 0; /ab, (3.7.7) 
and 

Var(B ;) = (b — 1)02 /ab. (3.7.8) 


The other parameters of interest are: ps + a; (mean levels of the factor A), ps + 
B; (mean levels of the factor B), pairwise differences a; — oj and B; j i and 


the contrasts of the type )\;_, €iai()5;_, £; =0) and ae Bi (Q)j- _, & =0). 
Their respective estimates together with the variances are given by 


e+ oy = Ji, Var(ji + a;)=02/b; 3.7.9) 
n+ B; = Vj, Var(p1 + B;)=02/a: (3.7.10) 
a; — a =F. — Fi) Vary — a) = 202 /b; (3.7.11) 
b= 6s =yj;—y;, Var(B; — B Bj) =202 /a; (3.7.12) 


Sha = 3-45, var) =)> eo o7/b; (3.7.13) 
i=] i=] i=] 


=] 


and 


ioe b b b 
Debi = DL G3.i> ver Gi) =D e202 (3.7.14) 
j=l 


j=l j=l = 
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The variance a? is, of course, estimated unbiasedly by 


6? =MSz. (3.7.15) 


MODEL II (RANDOM EFFECTS) 


In the case of the random effects model, the variance components may be 
estimated by the analysis of variance method, that is, by equating the observed 
mean squares in the lines of the analysis of variance table to their respective 
expected values and solving the equations for the variance components. The 
resulting estimators are: 


6? =MSz, (3.7.16) 

55 = — (MSp —MSz), (3.7.17) 
and 

6) = ~ (MS, —MSz). (3.7.18) 


It can be shown that these estimators are also the maximum likelihood estimators 
(corrected for bias) of the corresponding parameters. The parameter pj is, of 
course, estimated by 


h=y., (3.7.19) 


as in the case of the fixed effects model. The remarks concerning the negative 
estimates and the optimal properties of the analysis of variance estimators made 
in Section 2.9 also apply here.” 


MODEL III (MIXED EFFECTS) 


In the case of a mixed effects model, with A fixed and B random, the usual 
parameters of interest are j1, a;’s, 07, and o2. The corresponding estimators are: 


p= ¥., (3.7.20) 

CpG. PS 2 eget (3.7.21) 

ig = - (MSp —MSz), (3.7.22) 
and 

6? = MSz. (3.7.23) 


2 For a discussion of the nonnegative maximum likelihood as well as other nonnegative estimation 
procedures and their properties, the reader is referred to Sahai (1974b) and Sahai and Khurshid 
(1992). 
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3.8 INTERVAL ESTIMATION 


In this section, we present results on confidence intervals for parameters of 
interest under fixed, random, and mixed effects models. 


MODEL | (FIXED EFFECTS) 


Using the distribution theory of the error mean square; that is, 


MSe  x’[(a-1)@—-1)] 


; 3.8.1 
o2 (a —1)(b—-1) ( 
a 100 (1 — @) percent confidence interval for ao? is obtained as 
—1)(b-1)MS — 1)(b—1)MS 
(a — 1)( )MSe 2 (a — 1)( )MSzE (3.8.2) 


aay ae ee eee OO, ee ee a eee 
x*[(a—-1)(6-1),1-a@/2] x*[(a- 1) (6-1), @/2] 


Also, it is possible to construct confidence intervals using the ¢ distribution for 
a pairwise difference a; — a; or the contrast )°;_, £:a@;, where )-j_, £; = 0. 
To obtain the confidence limits for w; — a;,, we note from (3.7.11) that 


ee 
Aj; — Ay = Yi. — Yi? 


with 
Var(¥i. — ¥i7) = 202 /b. 
The confidence limits on a; — a; can, therefore, be derived from the relation 


(Vi. — Vir) — (i — a) 


J2MS «/b 


Thus, the corresponding 100(1 — a) percent confidence limits for a; — a; are 
given by 


~t[(a—1)(@—))]. 


(Vi. — Yi) Eta —1)(—- 1),1—a@/2] f2MSe/5. 


Similarly, confidence limits for the contrast )°;_, €:;a@; can be obtained from 
the relation 


a a 
2 Lyi. — > Lia; 
j=] rat 


[MS ¢ AL 
i=] 


~tia—-)®—-1))), 
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which yields 


45. - M < Yb < 45; +Mit=1-da, 
i=l 


i=l 


where 


M =t[(a—1)(6—-1),1-—a/2] 


MSz S°e/b. 
i=] 


Similar results hold for any pairwise difference B; — Bj i") OF the contrast 
ye 1 £5 B; Os. j<1 ©; = 0). However, for the reasons given in Section 2.19, the 


multiple comparison procedures discussed in Section 3.12 should be preferred. 


MODEL II (RANDOM EFFECTS) 


Exact confidence intervals for 07, 02 +a or o2 + bo?, variance ratios OR [oe 
ando2/o2, and proposiens of variances o2/(02 +03), eG te): 02 /(o2+ 

op), and o2 j (a2 + a2) can be obtained by using the results on the mapline 
distabucon for mean squares. In particular, the probability is 1 — @ that the 
interval 


—— 
a\MSz F[b—1,(a—1)(b—1);1—a/2] 


1 (MS, l , 
a Gs F[b-1,(a—)(—1);@/2] — )| 


captures oF /o2. However, as before, exact confidence intervals for o7 and OR 
do not exist. The approximate procedures available in the case of one-way 
classification are also applicable here.’ 


MODEL III (MIXED EFFECTS) 


The objective in this case is to set confidence limits on the variance components 


o2, 07, and on fixed effects a;’s. The limits ono? are the same as given in (3.8.2). 


Again, a exact aa for oF is not available, but one can set exact limits on 
a7 + a0, 2 and ao? 3/0; o;. Similarly, one can obtain confidence intervals for a pair- 
auice eens a; — a; or the contrast ee £30; Os £; = 0). Approximate 


3 For some results on approximate confidence intervals for the variance components o2 and o2, 
and the total variance oa? +o24 o2, including a numerical example, see Burdick and Graybill 
(1992, pp. 126-128). The problem of setting confidence intervals on the proportions of variability 
of (of +05 +04), og /(o, +0% +04), and og /(of +03 +0) has been considered by Arteaga 
et al. (1982). For a concise summary of the results, including a numerical example, see Burdick 
and Graybill (1992, pp. 129-132). 
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confidence intervals for Fe and a single factor A level mean pz + a; can be 
determined using the Satterthwaite procedure (see Appendix K). For some ad- 
ditional results including numerical examples, see Burdick and Graybill (1992, 
pp. 154-156). 


3.9 COMPUTATIONAL FORMULAE AND PROCEDURE 


As in the case of one-way classification, the sums of squares SS7,SS,, SSz, 
and SS- can be expressed in more computationally suitable forms. These are: 


a b 2 
y 
SS; = ed 3.9.1 
T = 2 ab (3.9.1) 
a 2 2 
Yi y 
54 — is, 3.9.2 
A Le ab ( ) 
b 2. 9) 
ee ee (3.9.3) 
— a ab 
7=1 
and 
a b a 2 b 2 2 
Ma Neg vg. es 
SSe= >>) yi - rae gee (3.9.4) 
=e i=l ja. 
where 
b | a 
yi = > vis, yi => vy, 
J=1 i=] 
and 


b 


.) 


a 
nes 
i=l 


The error sum of squares SS¢ is usually calculated by subtracting SS, + SSz 
from SS7; that 1s, 


Se se So viz. 
1 


SSe = SS7 — SS, — SSz. (3.9.5) 


The computational procedure for the sums of squares can be performed in 
the following sequence of steps: 


(i) Sum the observations for each row to form the row totals: 


Yi1.5 2.5 eee Ya.- 
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(ii) Sum the observations for each column to form the column totals: 


Vly» Y.20+++s Vibe 


(iii) Sum all the observations to obtain the overall or grand total: 


a b 
= ee 
i=l f=1 


(iv) Form the sum of squares of the individual observations to yield: 


a b 

2 2 2 2 
Y> > yi = Yip TYj2 +++ + ap: 
i=1 j=1 


(v) Form the sum of squares of the totals for each row and divide it by b to 
yield: 


Yo y/o. 
i=l 


(vi) Form the sum of squares of the totals for each column and divide it by 
a to yield: 


b 
dy /4. 
j=l 


(vii) Square the grand total and divide it by ab to obtain the correction factor: 
as 
ab 


Now, the required sums of squares, SS7, SS,4, SSg, and SS¢ are obtained by 
using the computational formulae (3.9.1), (3.9.2), (3.9.3), and (3.9.4) or (3.9.5), 
respectively. 

It is expected that most investigators would use computers in the handling of 
analysis of variance calculations. Otherwise, an electronic calculator is highly 
recommended, especially for a large data set. Such calculators have the addi- 
tional advantage that the totals and sums of squares can be determined at the 
same time providing a check on the previous calculation. 


3.10 MISSING OBSERVATIONS 


In the analysis of variance discussed in this chapter, it 1s assumed that there 
is exactly one observation in each cell of the two-way layout as shown in 
Table 3.1. However, in the process of conducting an experiment, some of the 
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observations may be lost. For example, the experimenter may fail to record an 
observation, animals or plants may die during the course of the experiment, or 
a subject may withdraw before the completion of the experiment. In such cases 
an approximate analysis discussed here may be used. The method consists of 
inserting estimates of the missing values and then carrying out the usual analysis 
as if no observations were missing. The estimates are obtained so as to minimize 
the residual or error sum of squares. However, care should be exercised in not 
including these estimates when computing the relevant degrees of freedom. 
Thus, for every missing value being estimated, the degrees of freedom for the 
residual mean square are reduced by one. 

Suppose the observation corresponding to the i-th row and the j-th column is 
missing and let it be denoted by y;;. Then all the sums of squares are computed in 
the usual way except, of course, that they all involve y;;. Itis then an elementary 
calculus problem to show that the value of yj; which minimizes the error sum 
of squares is given by 


by’, + ay; — y! 
(a—1)(b—1)’ 


A 


where y; denotes the total of b — 1 observations in the i-th row, y’, denotes 
the total of a — 1 observations in the j-th column, and y’ denotes the sum of 
all ab — 1 observations. The mathematical derivation of the formula (3.10.1) 
may be found in an intermediate or advanced level text (see, e.g., Peng (1967, 
pp. 109-110); Montgomery (1991, pp. 148-151); Hinkelmann and Kempthorne 
(1994, pp. 266-267)). If $;; obtained from (3.10.1) is substituted for the missing 
value, then SS,, SSg, and SS_ can be computed in the usual way. 

The formula (3.10.1) was first discussed by Allen and Wishart (1930). The 
F test will be slightly biased and the reader is referred to the paper by Yates 
(1933) for a discussion on this point. Also, it can be shown that when there is 
a missing observation in the i-th row, 


a | a i 
Var(5i.) = F eae |<: (3.10.2) 


and 


_ _. 42 a 3 
Var(yi. — Yi) = F ipa = Gano >| Of. (3.10.3) 


The expression (3.10.2) can also be written as 


Var(; -[t+ oo Fe (3.10.4) 
ar( Yi.) = soma |e 10. 


Note that the expression (3.10.4) is slightly greater than o02/(b — 1) and the 
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variance of a row mean with no missing value is o2/b. Furthermore, the error 
mean square is estimated correctly but the mean squares for factors A and B 
are somewhat inflated. To correct for this bias the quantity 


[y’; — (a — 1) Si 


3.10.5 
a(a—1) ( 


is subtracted from the factor A mean square. Similarly, to test for the factor B 
effects, the quantity 


[yi — (b— 1) ij}? 


3.10.6 
b(b—1) ( 


is subtracted from the factor B mean square. 

If there are two missing values, one can either repeat the foregoing proce- 
dure with two simultaneous equations obtained by minimizing the error sum 
of squares with respect to two missing values, or one can obtain an iterative 
procedure of guessing one value, and fitting the other by formula (3.10.1), then 
going back and fitting the first value, and so on. When there are several miss- 
ing values, one first guesses values for all units except the first one. Formula 
(3.10.1) is then used to find an initial estimate of the first missing value. With 
this initial estimate for the first one and the values guessed for the others, the 
formula (3.10.1) is again used to obtain an estimate of the second value. The 
process is continued in this manner to obtain estimates for the remaining val- 
ues. After completing the first cycle of the initial estimates, a second set of 
estimated values is found and the entire process 1s repeated several times until 
the estimated values are not different from those obtained in the previous cycle. 
The details may be found in Tocher (1952), Bennett and Franklin (1954, p. 
382), Cochran and Cox (1957, pp. 110-112), and Steel and Torrie (1980, pp. 
211-213). Healy and Westmacott (1956) gave a more general iterative method 
using a program that analyzes complete data rather rapidly, and Rubin (1972) 
presented a noniterative method. For m missing values, a computer program 
is usually required that will invert an m x m matrix. The general problem of 
missing data can usually be dealt with much more efficiently using an algo- 
rithm developed by Dempster et al. (1977). For further discussion of this topic 
the reader is referred to Anderson (1946), Dodge (1985), and Snedecor and 
Cochran (1989, pp. 273—278).4 For a discussion of correction for bias in mean 
squares for factors A and B when two or more observations 1n a row or column 
are missing, see Glen and Kramer (1958). | 

It should be remarked that the use of estimates for missing values does not in 
any way recover the information that is lost through the missing data. It is merely 
a computational procedure to enable the experimenter to make an approximate 


4 Hoyle (1971) gives an introduction to spoilt data (missing, extra, and mixed up observations) with 
an extensive bibliography. Afifi and Elashoff (1966, 1967) in a two-part article have considered 
the problem of missing data in multivariate statistics. 
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analysis. It is important that the investigator examine carefully the nature of the 
missing values. If the reasons for missing values can be attributed to chance, 
the treatment comparisons based on the remaining values will be unbiased, and 
the methods described previously may be generally applied. It is worthwhile 
to remember that the analysis of variance does not take lightly to missing data 
and utmost caution should be exercised to ensure that no observation is lost. 
The situation is best summed up by Cochran and Cox (1957, p. 82) when they 
state: “... the only complete solution to the ‘missing data’ problem is not to 
have them....” 


3.11 POWER OF THE ANALYSIS OF VARIANCE F TESTS 


The discussion on the power of the analysis of variance F test for the one-way 
classification given in Section 2.17 also applies here. Thus, under Model I, it 
follows from (3.6.5) and (3.6.7) that the power of the F test for the hypothesis 
on B;’s is given by 


MSp 
Power = P 
MS 


> F[b—1,(a—1)(b—1);1—a] |B; > 0 
A 


for at least one j 


= P{F'[b-1,(a—1)6- 1); $a] 
>F[b —1,(a — 1)(b— 1);1- a}, (3.11.1) 


where 


Similarly, for the hypothesis on a@;’s, we obtain 


Power = P{F'[a — 1, (a — 1)(b— 1); ba] 
>F[(a — 1), (a — 1)(b — 1); 1 -—a}}, (3.11.2) 


where 


The expressions (3.11.1) and (3.11.2) can be evaluated by using noncentral F 
tables or Pearson-Hartley charts as described in Section 2.17. 
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Under Model II, the power of the test for the hypothesis 
Ho : 03 =(Q versus A, 8 > 0 


is given by 


at > F[b-1,(a—1)(b- 1);1-a]|o3 > 0) 


- fre l,(a—1)(b—-1)] 


Oo 


9\ -1 
>(140%) FP 1.@= Dita} (3.11.3) 


Likewise, the power of the test for the hypothesis 
Ho:02=0 versus H;:02>0 


is given by 


Pt Fi —1,(a—1)(b—1)] 


o\ 71 
-( ns res) Pa=1. Gabi = al (3.11.4) 
Powers of the tests corresponding to the more general hypotheses of the type 
03/0; < pcan also be obtained similarly. 

Under Model III, the power of the test for B; effects involves the central F 
distribution and for a; effects involves the noncentral F distribution. The power 
results for Op are then the same as given in (3.11.3) and for @;’s the results are 
given by (3.11.2). 


3.12 MULTIPLE COMPARISON METHODS 


The results on multiple comparisons discussed in Section 2.19 are also appli- 
cable here with a few minor modifications. The procedures can be utilized for 
the fixed as well as the mixed effects models. For the fixed effects case, as we 
have seen in Section 3.8, the contrasts of interest may involve a;’s or B;’s and 
will be of the form 


L = liq, + €202 + ---+ Lag 
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or 
L' = €,Bi + £,B.+---+£,Bp, 


where 


The estimates of L and L’ are given by 


aA 


L = €,y1, + 292, +--+ + laa. 
and 
LE’ = 91 +452 +---+ G50. 


The results on Tukey’s and Scheffé’s methods are the same as given in Sec- 
tion 2.19 except that now MSz replaces MSw and (a — 1) (6 — 1) replaces 
a(n — 1) in degrees of freedom entries. Furthermore, MSg will be replaced by 
MS, or MSz depending upon whether the inferences are sought on @;'s or B;’s. 
Thus, for example, if Tukey’s method is used, L is significant at the a-level if 


i/ | Vi (22 «lt > qla,(a—1)(b—1)31 a]. 
i=] 


Similarly, if Scheffé’s method 1s used, L is significant at the a-level if 


: 1/2 
i/\e = pase 34/6] 
i=] 


> {F [a —1,(a —1)(b—-1);1-a}}!”. 


Likewise, the significance of the contrast L’ can be tested. The other multiple 
comparison procedures can also be similarly modified. 

For a single pairwise comparison, one can use J/2t [(a—1)(b—1), l-—a /2] 
instead of T = g[a, (a—1)(b—1); 1—a@]. Fora limited number of pairwise com- 
parisons, the Bonferroni method can be employed by using J2t [(a — 1) (b—1), 

1 — a /2k] instead of T. 

Under Model III, the contrasts of interest involve only the a;’s and the results 

are identical to those given previously.° 


> Fora general discussion of multiple comparison methods in a two-way layout, see Hirotsu (1973). 
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TABLE 3.3 
Loss in Weights Due to Wear Testing of 
Four Materials (in mg) 


Position 
Material 1 2 3 
] 241 270 274 
2 195 241 218 
3 235 273 230 
4 234 236 227 


3.13 WORKED EXAMPLE FOR MODEL | 


Consider the following example described in Davies (1954). An experiment was 
carried out for wear testing of four materials. A test piece of each material was 
extracted from each of the three positions of a testing machine. The reduction 
in weight due to wear was determined on each piece of material in milligrams 
and the data are given in Table 3.3. 

It is desired to test whether there are significant differences due to different 
materials and machine positions. Clearly, the data of Table 3.3 should be ana- 
lyzed under Model I, since the four materials and the three positions of testing 
machines are especially chosen by the experimenter to be of particular inter- 
est to her and thus will both have systematic effects. The mathematical model 
would be 


Yij =pt+a;+ Bj + ei; GS 1,2, 3,4; —— by 29) 


where jz is the grand mean, q; is the effect of the i-th material, 8; is the 
effect of the j-th position, and the e;;’s are random errors, with ar a; =0, 
ee B; =0, and e;, ~ N(O, a2). It is further assumed that no interaction 
between the material and position is likely to exist. 

To perform the analysis of variance computations, we first obtain the row 
and column totals as 


yi, = 785, yo, = 654, y3, = 738, yg, = 697; 
y; = 905, y2=1,020, y3 = 949; 


and the grand total is 


y. = 2,874. 
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The other quantities required in the calculations of the sums of squares are: 


2 2874)* 
Yn _ BBIAY _ egg 393 
ab 4x3 


L< 1 
= Di = zl(785)" + (654)" + (738)° + (697)"] = 691,464.67, 
i=1 
IZ 1 
a Dose lies vs 2 Dy 
95 = 7105)? + (1,020)? + (949)"] = 690,006.500, 
a 
j=l 
and 


b 
Y yz = (241)? + (270) + +++ + (227)? = 694,322. 


The resultant sums of squares are, therefore, given by 


a b 2 
_ De ne ee tia = 
SS; = = ) , Vii — a 694,322 — 688,323 = 5,999.000, 
(= j= 


Lee y? 
SS,=- 2 _ == — 691,464.667 — 688,323 = 3,141.667, 
A b De Yi. ab 
[eee y? 
SSp = — > y*, — = = 690,006.500 — 688,323 = 1,683.500, 
a =| ab 
and 
SSr = SS7 — SS, — SSp = 5,999.000 — 3, 141.667 — 1,683.500 


= 1,173.833. 


Finally, the results of the analysis of variance calculations are summarized in 
Table 3.4. 

If we choose the level of significance a2 = 0.05, we find from Appendix 
Table V that F [3, 6; 0.95] = 4.76 and F [2, 6;0.95] = 5.14. Comparing these 
values with the computed F values given in Table 3.4, we do not reject the 
hypothesis of no “position” effects (p = 0.069), but reject the hypothesis of no 
“material” effects (p = 0.039). That is, we may conclude that there is probably 
a significant difference due to the materials but not due to positions. 

To determine which materials differ, we use Tukey’s and Scheffé’s procedures 
for pairwise comparisons. For the Tukey’s procedure, we find from Appendix 
Table X that 


gla, (a — 1)(6-1);1-—a@] = q [4, 6;0.95] = 4.90. 
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TABLE 3.4 

Analysis of Variance for the Weight Loss Data of Table 3.3 

Source of Degreesof Sum of Mean Expected 

Variation Freedom Squares Square Mean Square FValue _p-Value 
3 34 

Material 3 3,141.667 1,047.222 of + 7 Sia? 5.35 0,039 
dsl 
4 3 

Position 2 1,683.500 841.750 of +z— DB; 4.300.069 
-1 

Error 6 1,173.833 195.639 oa? 

Total 1] 5,999.000 


Now, the pairwise differences of sample means for materials would be com- 


pared to 
195.639 
= 4,90 5 = 59.57. 


The four sample means for the materials are 


q{a,(a—1)(b—1);1—a@] 


yy, = 261.67, yo, = 218.00, 3, = 246.00, ya. = 232.33, 
and there are six pairs of differences to be compared. Furthermore, 


l¥1. — Yo,.| = 43.67 > 39.57, |). — y3,.| = 15.67 < 39.57, 
l¥i. — y4.| = 29.34 < 39.57, |¥2, — y3,.| = 28.00 < 39.57, 
| Yo. = Ya. | = 14.33 < 39.57, | ¥3. = Ya. | = 13.67 < 39.57. 


Hence, we may conclude that materials one and two are probably significantly 
different but not the others. 
For the Scheffé’s procedure, we find from Appendix Table V that 


S? = F[a—1,(a—1)(b—1);1—a] = F [3, 6;0.95] = 4.76. 


Furthermore, for the contrasts consisting of the differences between two means, 
we have 


oie 
* SS) 


ce 
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TABLE 3.5 
Number of Minutes Observed in Grazing 
Animal 
Observer 1 2 3 4 5 6 7 8 9 10 


34 76 %7 31 #61 82 82 67— 72 38 
33 76 «72 = 29 «66006 6820CO84 OTs. 36 
35 78 76 30 65 86 88 66 £76 37 
34 77 71 29 60) «78 683667 C72 37 
33 77 = 70 2706 6559 81 82S TSs 70 33 


Mm & WN — 


Source: John (1971, p. 68). Used with permission. 


So that the differences among the sample means will now be compared to 


2 | 2 
S2 (a — 1)MS¢ cae = /4.76 (3) (195.639) (=) = 43.16. 


Here, again, we may conclude that materials one and two are probably sig- 
nificantly different but not the others. However, evidence from the Scheffé’s 
method is not as strong as that from the Tukey’s method. The significance of 
any other contrasts of interest can also similarly be evaluated. 


3.14 WORKED EXAMPLE FOR MODEL II 


The following example is based on a study on the grazing habits of Zebu cattle 
in Entebbe, Uganda. A group of 10 cattle was observed and recorded every 
minute they were grazing. A group of five observers was chosen and the same 
group was used during the entire experiment. They followed a group of cattle 
in the same paddock for 88 minutes during one afternoon. The data given in 
Table 3.5 are taken from John (1971, p. 68) and represent the number of minutes 
in which observer i (i = 1,...,5) reported animal j (7 = 1, ..., 10) grazing. 

We now proceed to analyze the data of Table 3.5 under Model II since the 
group of observers and animals in the study can be regarded as random samples 
from the respective populations of observers and cattle and the results of the 
analysis are to be valid for the entire populations. Thus, the factors of observers 
and animals will both have random effects. The mathematical model would be 


Vy = +a; + Bj + ei; (i — eee eat Coeerene (0) 


where ju is the general mean, q; is the effect of the i-th observer, B; 1s the effect 
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of the j-th cattle, and the e;;’s are random errors with 
a; ~ N(0, 02), pp N(0, Op); and éj; ~ N(0, a; ). 
In addition, the a;’s, B;’s, and e;;’s are mutually and completely independent. 


To perform the analysis of variance computations, we first obtain the row 
and column totals as 


y;, = 618, yo =O611, y3, = 637, ys = 608, ys. = 599; 
y, = 169, y2= 384, y3 = 364, y4= 146, ys = 305, 
yo6=> 409, V7= 419, yg = 334, yo = 362, Y10 = 181; 


and the grand total is 
y.. = 3,073. 
The other quantities needed in the calculations of the sums of squares are: 


y2 _ (3,073) 


se = 188,866.580, 
ab 50 


| = 2 I 2 2 2 

—)_y7 = —[(618 611)* +---+(599)"] = 188,947.900 
b 2! Tie y+ (O11) + +--+ 699)"] = 188,947.900, 
I : 2 I 2 2 

= ) y= 5 169) + (384)* + +--+ (181)*] = 208,211.400, 

a 


and 
a 


b 
\ > yp, = 34)? + (16) + --- + (33)? = 208,367. 
i=l j=l 


The resultant sums of squares are, therefore, given by 


SS; = S y95- — = 208,367 — 188,866.580 = 19,500.420, 


se 


2 
y 
SSj-=— 2 _ == — 188 947.900 — 188,866.580 = 81.320, 

oo ae i. Ob 


SS, =- aie = 208,211.400 — 188,866.580 = 19,344.820, 
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TABLE 3.6 
Analysis of Variance for the Grazing Data of Table 3.5 
Source of Degreesof Sum of Mean Expected 
Variation Freedom Squares Square MeanSquare F Value p-Value 
Observer 4 81.320 20.330 o2+1002 9.85  <0.001 
Animal 9 19,344.820 2,149.424 af + SoZ 1,041.89 <0.001 
Error 36 74.280 2.063 oa? 
Total 49 19,500.420 

and 


SSe = SSr — SS, — SSg = 19,500.420 — 81.320 — 19,344.820 = 74.280. 


Finally, the results of the analysis of variance calculations are summarized in 
Table 3.6. 

If we choose the level of significance a = 0.05, we find from Appendix 
Table V that F[4, 36;0.95] = 2.63 and F[9, 36;0.95] = 2.15. Comparing 
these values with the computed F' values given in Table 3.6, we may conclude 
that o2 > 0 and oj > 0, and there are strong significant differences among 
observers (p < 0.001) as well as among animals (p < 0.001). 

Furthermore, to evaluate the relative magnitude of the variance components 

2 


omen a7, and o2, we may obtain their unbiased estimates using the formulae 


(3.7.16), (3.7.17), and (3.7.18), respectively. Hence, we find that 


6? = 2.063, 
1 
33 = =(2149.424 — 2.063) = 429.472, 


1 
62 = 79 (20-330 — 2.063) = 1.827, 


and the best estimate of the total variance a? is given by 


6, =6, +6, 76, 
= 2.063 + 429.472 + 1.827, 
= 433.362. 


Now, the estimated proportions of the relative contribution of the variance 
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components to the total variance are: 


622.063 

62 433.362 

5g = 429.472 
=~ = 0.991, 
62 433.362 

and 

621.827 

<< = —— = 0.004. 
62 433.362 


Thus, we note that about 99 percent of the variation in the observations is 

attributable to animals. This would probably be the most important finding in 

the experiment suggesting that the cattle vary vastly in their habits of grazing. 
To obtain a 95 percent confidence interval for o7, we have 


MSz = 2.063, 2[36, 0.025] = 21.34, and y2[36,0.975] = 54.44. 


Substituting these values in (3.8.2), the desired 95 percent confidence interval 
for a2 is given by 


E x 2.063 > 36x 2.063 


— 0.95 
5444. £21.34 


Or 


P[1.364 < 0? < 3.480] = 0.95. 


3.15 WORKED EXAMPLE FOR MODEL Ill 


In a plastics manufacturing factory, it is discovered that there is considerable 
variation in the breaking strength of the plastics produced by three different 
machines. The raw material is considered to be uniform and hence can be 
discarded as a possible source of variability. An experiment was performed to 
determine the effects of the machine and the operator on the breaking strength. 
Four operators were randomly selected and each assigned to a machine. The 
data are given in Table 3.7. 

It is desired to test whether there are significant differences among machines 
and operators. Since four operators were selected at random from a large pool of 
operators, who in turn were assigned to three specific machines, the experiment 
fits the assumptions of the mixed effects model. The mathematical model would 
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TABLE 3.7 
Breaking Strength of the 
Plastics (in lbs.) 


Operator 


Machine 1 2 3 4 


1 106 110 = 106 104 
107. 111-108 110 
3 109 «113112 111 


be 
yj = eta; + Bj +4; CEs nes Paey e—a ererere ee 9 


where yu is the general mean, a; is the effect of the i-th machine, 8; is the effect 
of the j-th operator, and e;;’s are random errors with 


3 
Y a; =0, Bj ~N(0,0%), and ej; ~ N(0,02). 
i=1 


It is further assumed that no interaction between the machine and the oper- 
ator is likely to exist and the 6,;’s and the e;;’s are mutually and completely 
independent. 

For the validity of the preceding assumptions, it 1s, of course, necessary 
that the experimenter must take appropriate measures to ensure that operators 
are randomly selected from the large pool of operators available. Moreover, 
systematic errors due to other factors should be avoided including possible 
sources of variation in the working conditions and in measuring the breaking 
strength. Random assignment of raw materials is also important. 

To perform the analysis of variance computations, we first obtain the row 
and column totals as 


yy. = 426, yo, = 436, y3. = 445; 
Yi = 322, y2= 334, V3 = 326, ¥4= 325; 


and the grand total is 


y. = 1,307. 
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The other quantities needed in the calculations of the sums of squares are: 


2 (1,307)? 
Deas Fist 142, 354.083, 
ab 12 
le l 
, y= 1426)" + (436)* + (445)"] = 142,399.250, 


ly I 
= v= 3 1(322)" + (334)? + (326)? + (325)"] = 142,380.333, 
a 


and 


Q 


i=) Jel 


The resultant sums of squares are, therefore, given by 


a b 2 
2 a ae = 
SS; = y > ) Viz — Be = 142,437 — 142,354.083 = 82.917, 
i= j= 


ie y? 

SS, =- 2_ == — 142.399.250 — 142,354.083 = 45.167, 
Ab d ab 
hee yo 

SSz = - ey — —~ = 142,380.333 — 142,354.083 = 26.250, 
a ab 


and 
SSe = SSr — SS,4 — SSp = 82.917 — 45.167 — 26.250 = 11.500. 


Finally, the results of the analysis of variance calculations are summarized in 
Table 3.8. 

If we choose the level of significance a = 0.05, we find from Appendix 
Table V that F[2, 6;0.95] = 5.14. Since the calculated F value of 11.78 exceeds 
5.14 (p = 0.008), we may conclude that the machines differ significantly. 
Similarly, we find that F[3, 6;0.95] = 4.76, which is greater than the calculated 
value of 4.57 (p = 0.054) and so we do not reject the hypothesis of no significant 
operator effects. 

To determine which machines differ, we use Tukey’s and Scheffé’s proce- 
dures for paired comparisons. For the Tukey’s procedure, we find from Ap- 
pendix Table X that 


q [a, (a — 1)\(b — 1);1 — a] = g [3, 6; 0.95] = 4.34. 


160 The Analysis of Variance 


TABLE 3.8 

Analysis of Variance for the Breaking Strength Data of Table 3.7 

Source of Degreesof Sumof Mean Expected 

Variation Freedom Squares Square Mean Square FValue _p-Value 

4 3 

Machine 2 45.167 22.583 o7+-=—) a? 11.78 0.008 
= 

Operator 3 26.250 8.750 of + 303 4.57 0.054 

Error 6 11.500 1.917 o? 

Total 11 82.917 


Now, the pairwise differences of sample means for machines are compared to 


MS 1.917 
q{a, (a — I)(b— 1); 1—@],/ = — ere — 3.01. 


The three sample means for machines are 


¥,, = 106.50, 2, = 109.00, and j3, = 111.25, 
and there are three pairs of differences to be compared. Furthermore, 


v1. — ¥2.| = 2.50 < 3.01, 
ly. = y3, | = 4.75 > 3.01, 


and 
|¥2, — ¥3,| = 2.25 < 3.01. 


Hence, we may conclude that machines one and three are probably significantly 
different but not the others. 
For the Scheffé’s procedure, we find from Appendix Table V that 


S? = F[a—1,(a—1)\(b —1);1 —@] = F [2, 6;0.95] = 5.14. 


Furthermore, for the contrasts consisting of the differences between the two 
means, we have 


ois 
i) 


a 
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So that the differences among the sample means are now compared to 


: 1 
S2(a — 1)MS¢ a = 5.14(2,(1.917)(5 } = 3.14. 


Here, again we conclude that machines one and three are probably significantly 
different but not the others. 

Furthermore, if the experimenter is interested in estimating the magnitudes 
of the variance components a7 and a, these may be estimated unbiasedly by 
using the formulae (3.7.22) and (3.7.23) as 


62 = 1.917 
and 


1 
= 3 (8.750 — 1.917) = 2.278. 


Finally, suppose we want to calculate the power of the test when the effect of 
one machine 1s higher than the other two by 3 lbs. Thus, we have, 


a3 =3+a,=3+4+ a), 


which gives a; = @2 = —1, and a3 = 2. 


So that 
4c, ,  f4{(-1? +(-1% + 2)7} 
a ee ei A Ie A i 91 
3 2 V 3x 1.917 


where an estimate of o? is obtained from MS ¢ = 1.917. Now, using the Pearson- 
Hartley chart, given in Appendix Chart I], with v; = 2, v. = 6,a = 0.05, and 
go = 2.04, the power is found to be about 0.66. 


3.16 WORKED EXAMPLE FOR MISSING VALUE ANALYSIS 


Consider the loss in weight due to wear testing data given in Table 3.3 and 
suppose that the observation corresponding to Material 1 and Position 1 is 
missing. We then have: 


a=4, b=3, y’ =2,633, y, =544, and y’, = 664. 
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Therefore, on substituting in the formula (3.10.1), we obtain 


3(664 4(544) — 2,633 
Ju = pA ated un ea ae = 255.8. 
(4-—1)(3 —- 1) 


This value is then entered in Table 3.3 in place of 241 and all sums of squares 
are computed as usual. The relevant computations are given in the following. 
The row and column totals, after substituting for the missing value, are: 


yi, = 799.8, yo, = 654, y3, = 738, y4, = 697. 
y1 =919.8, y2= 1,020, y3 = 949; 


and the grand total is 
y,, = 2,888.8. 
The other quantities needed in the calculations of the sums of squares are: 


y2 (2,888.8) 


— = ——__—. = 695, 430.453, 
ab 4x3 
I 2 I 2 2 2 2 
5 y= Bie2®) + (654)° + (738)° + (697)°] = 699,283.013, 
i=l 
i 


1 
7 1(919.8)° + (1,020)? + (949)?] = 696,758.260, 


Q | 
— 

nw N 
| 


and 
a b 

Y > yz, = (255.8)? + (270)? + ++ + (227)? = 701, 674.640. 
i=1 j=l 

The resultant sums of squares are, therefore, given by 


SSr = Es y?, — == = 701,674.640 — 695,430.453 = 6,244. 187, 
i= a 


Ss,.=— am, 2 _ =. — 699,283.013 — 695,430.453 = 3,852.560, 


SSp = 7 y?, — == = 696,758.260 — 695,430.453 = 1,327.807, 
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TABLE 3.9 
Analysis of Variance for the Weight Loss Data of Table 3.3 with 
One Missing Value 


Source of Degreesof Sumof Mean Square Mean Square 


Variation Freedom Squares (Biased) (Unbiased) F Value p-Value 
Material 3 3,852.560 1,284.187 987.199 4.64 0.066 
Position Z 1,327.807 663.904 576.424 2.71 0.159 
Error 6-1 1,063.820 212.764 212.764 
Total 11-1 6,244.187 

and 


SSe = SS; — SS, — SSg = 6,244.187 — 3,852.560 — 1,327.807 
= 1,063.820. 


Finally, the results of the analysis of variance calculations are summarized in 
Table 3.9. Notice that the degrees of freedom in the total and error sums of 
square are reduced by one. 

To correct for the bias, from equation (3.10.5), the quantity to be subtracted 
from the material mean square is 


664 — (4 — 1)(255.8)]? 
[664 ~ (4 255.9)? _ 66 ogg 
4(4-1) 
This gives 1,284.187 — 296.988 = 987.199 for the correct mean square. Sim- 
larly, from equation (3.10.6), the quantity to be subtracted from the position 
mean square 1s 


2 

[544 — (3 — 1)(255.8)] _ 97.480. 
3(3 —1) 

This gives 663.904 — 87.480 = 576.424 for the correct mean square. The 

corrected mean squares, the variance ratios, and the associated p-values are also 

shown in Table 3.9. Note that the conclusions drawn in the Worked Example in 

Section 3.13 regarding differences in material and position effects are slightly 

affected. 

From equation (3.10.2), the estimated standard error of the sample mean of 

the material (with the missing value) is 


a 4 
SE (y1.) = 


z+ eta a (212.764) = 10.872. 
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Similarly, from equation (3.10.3), the estimated standard error of the difference 
between the means for materials 1 and 2, is 


si, 3 — [f24 4 
(1. — y2) = E 34-DG—-D 


(212.764) = 13.752. 

The same standard error applies for the comparison of the mean of material 
1 with means of materials 3 and 4. In contrast, the estimated error for the 
comparison of means of a pair materials with no missing values (i.e., 2 vs. 3, 2 


vs. 4, and 3 vs. 4), is given by ./2 (212.764) /3 = 11.910. 


3.17 USE OF STATISTICAL COMPUTING PACKAGES 


Two-way fixed effects analysis of variance with one observation per cell and 
no missing values can be performed using the SAS ANOVA procedure. The 
missing observations could be estimated and then the ANOVA procedure could 
be employed to perform the modified analysis as described in Section 3.10. For 
random and mixed model analysis of variance, the F tests remain unchanged, 
and no special analysis is required. The moment estimates of variance compo- 
nents can easily be computed from the entries of the analysis of variance table. 
For estimating variance components, using other methods, PROC MIXED or 
VARCOMP can be used. The details of SAS commands for executing these 
procedures are given in Section 11.1. 

Among SPSS procedures, either ANOVA, MANOVA, or GLM could be 
used, although ANOVA would be simpler. The analysis of data with missing 
values could be handled as indicated previoulsy. As before, for random and 
mixed effects analysis, no special tests are required. Further, SPSS Release 
7.5 now includes a VARCOMP procedure which provides for three methods 
for the estimation of variance components. For instructions regarding SPSS 
commands, see Section 11.2. 

In using the BMDP package, two programs suited for this model are 7D and 
2V if the analysis involves only fixed effects in the model. For the analysis 
involving random and mixed effects models 3V and 8V are recommended. 


3.18 WORKED EXAMPLES USING STATISTICAL PACKAGES 


In this section, we illustrate the applications of statistical packages to perform 
two-way analysis of variance with one observation per cell for the data sets 
employed in examples presented in Sections 3.13 through 3.15. Figures 3.1, 3.2, 
and 3.3 illustrate the program instructions and the output results for analyzing 
data in Tables 3.3, 3.5, and 3.7 using SAS ANOVA/GLM, SPSS ANOVA/GLM 
and BMDP 2V/8V procedures. The typical output provides the data format 
listed at the top, cell means, and the entries of the analysis of variance table. 
Note that in each case the results are the same as those provided using manual 
computations in Sections 3.13 through 3.15. | 
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DATA WEARTEST; The SAS System 


INPUT MATERIAL POSITION Analysis of Variance Procedure 
WEIGHT; 


DATALINES; Dependent Variable: WEIGHT 

11 241 Sum of Mean 

a ae Source DF Squares Square — F Value Pr>F 

4 3 227 Model 5 4825.1667 965.0333 4.93 0.0388 
Error 1173.8333 195.6389 


PROC ANOVA; 


CLASSES MATERIAL POSITION; Corrected 5999.0000 

MODEL WEIGHT=MATERIAL Total 

1 POSITION; R-Square c.V. Root MSE WEIGHT Mean 
| RUN; 0.804328 5.840124 13.987 239.50 
CLASS LEVELS VALUES 2 

MATERIAL 4° 12 3 4 |Source DF Anova SS Mean Square F Value Pr >F 
1 POSITION 3 12 3 MATERIAL 3 3141.6667 1047.2222 5.35 0.0393 
NUMBER OF OBS. IN DATA POSITION 2 1683.5000 841.7500 4.30 0.0693 


SET=12 


(i) SAS application: SAS ANOVA instructions and output for the two-way fixed effects 
analysis of variance with one observation per cell. 


DATA LIST ANOVA (a,b) 

/MATERIAL 1 

POSITION 3 Unique Method 

WEIGHT 5-7. 
BEGIN DATA. Sum of df Mean 
11 241 Squares Square 
1 2 270 WEIGHT Main (Combined) 4825.167 5 965.033 


By, 0 Effects 
4 3 227 MATERIAL 3141.667 3 1047.222 
END DATA. POSITION 1683.500 2 841.750 
ANOVA WEIGHT BY Model 4825.167 5 965.033 
MATERIAL (1, 4) Residual 1173.833 6 195.639 
+ POSITION (1, 3) Total 5999.000 11 545.364 
/MAXORDER=NONE 
/STATISTICS=ALL. |a WEIGHT by MATERIAL, POSITION b All effects entered simultaneously 


(ii) SPSS application: SPSS ANOVA instructions and output for the two-way fixed 
effects analysis of variance with one observation per cell. 


/ INPUT FILE ='C:\SAHAI BMDP2V - ANALYSIS OF VARIANCE AND COVARIANCE WITH 
\TEXTO\EJES.TXT'. REPEATED MEASURES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT#=FREE. 

VARIABLES=3. ANALYSIS OF VARIANCE FOR THE 1-ST DEPENDENT VARIABLE 

NAMES=MAT, POS, THE TRIALS ARE REPRESENTED BY THE VARIABLES:WEIGHT 
WEIGHT. 

VARIABLE=MAT, POS. THE HIGHEST ORDER INTERACTION IN EACH TABLE HAS BEEN 

CODES (MAT) =1,2,3, 4. REMOVED FROM THE MODEL SINCE THERE IS ONE SUBJECT PER 

NAMES (MAT) =M1,M2, CELL 


M3,M4. 
CODES (POS) =1,2,3. SOURCE SUM OF D.F. MEAN 
NAMES (POS) =P1, P2,P3. SQUARES SQUARE 
DEPENDENT=WEIGHT. 


MEAN 688323.00000 688323.00000 3518.33 
MATERIAL 3141. 66667 1047.22222 5.35 
POSITION 1683.50000 841.75000 4.30 
ERROR 1173. 195.63889 


(iii) BMDP application: BMDP 2V instructions and output for the two-way fixed effects 
analysis of variance with one observation per cell. 


FIGURE 3.1 Program Instructions and Output for the Two-Way Fixed Effects 
Analysis of Variance with One Observation per Cell: Data on Loss in Weights Due 
to Wear Testing of Four Materials (Table 3.3) 


DATA ANMLGRAZ; The SAS System 

INPUT OBSERVER ANIMAL General Linear Models Procedure 

GRAZING; Dependent Variable: GRAZING 

DATALINES; Sum of Mean 

11 £34 Source DF Squares Square F Value Pr > F 
a, Sek Model 13 19426.140 1494.318 724.23 0.0001 
5 10 33 Error 36 74.280 2.063 

; Corrected 49 19500.420 

PROC GLM; Total 

CLASSES OBSERVER ANIMAL; R-Square c.V. Root MSE GRAZING Mean 
MODEL GRAZING=OBSERVER 0.996191 2.337180 1.4364 61.460 

ANIMAL? Source DF Type I SS Mean Square F Value Pr> F 
RANDOM OBSERVER ANIMAL; OBSERVER 81.320 20.330 9.85 0.0001 
RUN; ANIMAL 19344.820 2149.424 1041.72 0.0001 
} CLASS LEVELS VALUES Source Type III SS Mean Square F Value Pr > F 
OBSERVER 5 12345 OBSERVER 81.320 20.330 9.85 0.0001 
ANIMAL 10 12345 ANIMAL 19344.820 2149.424 1041.72 0.0001 
678 9 10 Source Type III Expected Mean Square 
| NUMBER OF OBS. IN DATA OBSERVER Var (Error) + 10 Var (OBSERVER) 

ANIMAL Var(Error) + 5 Var(ANIMAL) 


(i) SAS Application: SAS GLM instructions and output for the two-way random effects 
analysis of variance with one observation per cell. 


DATA LIST Tests of Between-Subjects Effects Dependent Variable: GRAZING 
/OBSERVER 1 

ANIMAL 3-4 Source Type III Ss df Mean Square F Sig. 
GRAZING 6-7. OBSERVER Hypothesis 81.320 4 20.330 9.853 .000 
BEGIN DATA. Error 74.280 36 2.063 (a) 

11 34 ANIMAL Hypothesis 19344.820 9 2149.424 1041.724 .000 
12 76 Error 74.280 36 2.063 (a) 

13 #75 a MS(Error) 


5 10 33 Expected Mean Squares (a,b) 

END DATA. Variance Component 
GLM GRAZING BY Source Var (OBSERVER) Var (ANIMAL) Var (Error) 
OBSERVER ANIMAL | OBSERVER 10.000 -000 1.000 

/DESIGN ANIMAL 000 5.000 1.000 

OBSERVER Error -000 - 000 1.000 

ANIMAL a For each source, the expected mean square equals the sum of the 
/RANDOM coefficients in the cells times the variance components, plus a 
OBSERVER quadratic term involving effects in the Quadratic Term cell. 
ANIMAL. b Expected Mean Squares are based on the Type III Sums of Squares. 


ae 


(ii) SPSS Application: SPSS GLM instructions and output for the two-way random 
effects analysis of variance with one observation per cell. 


BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
- EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 


* PILE='C: \SAHAI 
\TEXTO\EJE6. TXT". 
FORMAT=FREE. 

VARIABLES=10. 

/VARIABLE NAMES=A1,...,A10. 


ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 


j /DESIGN NAMES=OBSR, ANIM. SOURCE ERROR SUM OF D.F. MEAN F TAIL 
LEVELS=5, 10. TERM SQUARES SQUARE PROB. 
RANDOM=OBSR, ANIM. 1 MEAN 188866.5800 1 188866.580 
MODEL='0,A". 2 OBSERVER OA 81.3200 4 20.330 9.85 0.0000 
| 3 ANIMAL OA  19344.8200 9 2149.424 1041.72 0.0000 
134 76 75 31 61 82 82 67 72 |4 OA 74.2800 36 2.063 
138 
Vee Ogee te . 7 « a . SOURCE EXPECTED MEAN ESTIMATES OF 
33 77 70 27 59 81 82 67 70 SQUARE VARIANCE COMPONENTS 
33 1 MEAN 50(1) +10 (2) +5 (3)+(4) 3733.97778 
ANALYSIS OF VARIANCE DESIGN 2 OBSERVER 10(2) + (4) 1.82667 
INDEX OBSR ANIM |3 ANIMAL 5(3) + (4) 429.47222 
NUMBER OF LEVELS 5 10 4 OA (4) 2.06333 


POPULATION SIZE INF INF 


MODEL 0, A 


(111) BMDP Application: BMDP 8V instructions and output for the two-way random 
effects analysis of variance with one observation per cell. 


FIGURE 3.2 Program Instructions and Output for the Two-Way Random Effects 
Analysis of Variance with One Observation per Cell: Data on Number of Minutes 
Observed in Grazing (Table 3.5). 


| DATA BREAKSTR; The SAS System 

INPUT MACHINE OPERATOR General Linear Models Procedure 

BREAKING; Dependent Variable: BREAKING 

DATALINES; Sum of Mean 

1 1 106 Source Squares Square F Value Pr >F 
1 2 110 Model 71.4167 14.2833 7.45 0.0149 
oe ok Error 11.5000 1.9167 

3 4111 Corrected 82.9167 

; Total 

PROC GLM; R-Square C.V. Root MSE BREAKING Mean 
CLASSES MACHINE OPERATOR; 0.861307 1.271098 1.3844 108.92 
MODEL BREAKING=MACHINE Source DF Type I SS Mean Square F Value Pr >F 
OPERATOR; MACHINE 2 45.1667 22.5833 11.78 0.0084 
RANDOM OPERATOR; OPERATOR 3 26.2500 8.7500 4.57 0.0543 
RUN; Source DF Type III SS Mean Square F Value Pr >F 
CLASS LEVELS VALUES MACHINE 2 45.1667 22.5833 11.78 0.0084 
MACHINE 3 12 3 OPERATOR 3 26.2500 8.7500 4.57 0.0543 
OPERATOR 4 12 3 4 | Source Type III Expected Mean Square 

NUMBER OF OBS. IN DATA MACHINE Var(Error) + Q(MACHINE) 

SET=12 OPERATOR Var (Error) + 3 Var(OPERATOR) 


(i) SAS application: SAS GLM instructions and output for the two-way mixed effects 
analysis of variance with one observation per cell. 


Tests of Between-Subjects Effects Dependent Variable: BREAKING 
/MACHINE 1 
OPERETOR 3 Source Type III SS af Mean Square FE Sig. 
BREAKING 5-7. MACHINE Hypothesis 45.167 2 22.583 11.783 .008 
BEGIN DATA. Error 11.500 6 1.917 (a) 
11 106 OPERATOR Hypothesis 26.250 3 8.750 4.565 .054 
1 2 110 Error 11.500 6 1.917 (a) © 
13 106 a MS(ERROR) 


3 4 111 Expected Mean Squares (a,b) 
END DATA. Variance Component 
| GLM BREAKING BY | Source Var (OPERATOR) Var (Error) Quadratic Term 
MACHINE .000 1.000 Machine 
OPERATOR 3.000 1.000 
Error .000 1.000 
a For each source, the expected mean square equals the sum of the | 
OPERETOR coefficients in the cells times the variance components, plus af 
| /RANDOM quadratic term involving effects in the Quadratic Term cell. f 
OPERETOR™ "7 b Expécted Meari Squares are based on the Typé III~StimsS of Squares. 


(ii) SPSS application: SPSS GLM instructions and output for the two-way mixed effects 
analysis of variance with one observation per cell. 


BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
- EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 


FILE='C: \SAHAI 
\TEXTO\EJE7.TXT'. 
FORMAT=FREE. 

| VARIABLES=4. 
/VARIABLE NAMES=01,...,04. 


| /INPUT 


ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 


/ DESIGN NAMES=M, 0. SOURCE ERROR SUM OF D.F. MEAN FE 

LEVELS=3, 4. TERM SQUARES SQUARE 

RANDOM=O. 1 MEAN 142354.0833 1 142354.083 
MODEL='M, O°. 2 MACHINE MO 45.1667 2 22.583 11.78 0.0084 
y /END 3 OPERATOR MO 26.2500 3 8.750 4.57 0.0543 
7106 110 106 104 4 MO 11.5000 6 1.917 


1107 11} 108 110 


1109 113 112 111 SOURCE EXPECTED MEAN ESTIMATES OF 
ANALYSIS OF VARIANCE DESIGN SQUARE VARIANCE COMPONENTS 

| INDEX M fe) 1 MEAN 12 (1) +4 (2) +3(3)+(4) 11860.38889 

| NUMBER OF LEVELS 3 4 2 MACHINE 4(2)+(4) 5.16667 

1} POPULATION SIZE INF INF 3 OPERATOR 3(3)+(4) 2.27778 
| MODEL M, O 4 MO (4) 1.91667 | 


Fe ge a 
(iii) BMDP application: BMDP 8V instructions and output for the two-way mixed 
effects analysis of variance with one observation per cell. 


FIGURE 3.3. Program Instructions and Output for the Two-Way Mixed Effects 
Analysis of Variance with One Observation per Cell: Data on Breaking Strength 
of the Plastics (Table 3.7) 
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3.19 EFFECTS OF VIOLATIONS OF ASSUMPTIONS 
OF THE MODEL 


For the two-way crossed model (3.1.1), with a single observation and no in- 
teraction, Welch (1937) and Pitman (1938) compared the moments of the beta 
Statistic, related to the usual F statistic, both under the assumption of normal- 
ity and under permutation theory. There was close agreement between the two 
results which lends support to the robust nature of the F test as in the case 
of one-way classification. For a further discussion of this point, the reader is 
referred to Hinkelman and Kempthorne (1994, Chapter 8). 

The problem of the effect of unequal error variances on the inferences of the 
model (3.1.1) was studied by Box (1954b). The results showed that for minor 
departures from homoscedasticity, the effects are not large. If the variances 
differ row-wise but are constant over columns, then the actual a-level for the 
test of row effects is slightly greater than the nominal value. For the test of the 
column effects, the reverse is true. 

Box (1954b) also studied the effect of a first-order serial correlation be- 
tween rows within columns. It was found that treatment (row) comparisons are 
not appreciably affected by serial correlation between treatment measurements 
within a block (column). However, serial correlations among the measurements 
on each treatment can seriously affect the validity of treatment comparisons. 

For a discussion of effects of violation of assumptions under Models II and 
III, see Section 4.19. 


EXERCISES 


1. The drained weights of frozen oranges were measured for various 
compositions and concentrations of a drink. The original weights in 
each case were the same. Any observed differences in drained weights 
can thus be attributed to differences in concentration or composition 
of the drink. The relevant data on weights (0z.) are as follows. 


Composition 
Concentration (%) Ci C> C3 C4 
20 21.52 21.32 22.19 22.19 
30 2232. -2152. (22:32. 23215 
40 22.56 23.12 22.42 21.52 
50 23.31 22.15 21.32 22.16 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Does the concentration of the drink have a significant effect on 
the drained weight? Use a = 0.05. 

(d) Does the composition of the drink have a significant effect on 
the drained weight? Use a = 0.05. 
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(e) If there are significant differences in drained weights due to drink 
concentration, use a suitable multiple comparison method to de- 
termine which concentrations differ. Use a = 0.01. 

(f) Same as part (e) but for drink composition. 

2. Three levels of fertilizer in combination with two levels of irrigation 
were employed in a field experiment. The six treatment combinations 
were randomly assigned to plots. The relevant data on yields are as 
follows. 


Level of Fertilizer 
Level of 


Irrigation High Medium Low 


Yes 380 340 305 
No 330 360 340 


(a) Describe the appropriate mathematical model and the assump- 
tions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Are there significant differences among the levels of the fertil- 
izer? Use a = 0.05. 

(d) Obtain point and interval estimates of the overall difference in 
yield due to irrigation. 

(e) Obtain point and interval estimates of the mean difference in 
yield between high and low fertilizer levels. 

3. An experiment was conducted to assess the effect of four brands of 
cutting fluids on the abrasive wear of four types of cutting tools. The 
measure of wear was reported in terms of the logarithm of loss of tool 
flank weight (in grams times 100) in a 1-hour test run. The relevant 
results are summarized as follows. 


Cutting Fluid Brand 


Cutting Tool CF1 CF2 CF3 CF4 


CT1 1.171 1.057 1.061 1.011 
CT2 0.705 0.612 0.631 0.598 
CT3 0.538 0.418 0.457 0.412 
CT4 0.414 1.308 1.371 1.251 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Do the fluid brands have a significant effect on the measure of 
abrasive wear? Use a = 0.05. 

(d) Does the type of cutting tool have a significant effect on the 
measure of abrasive wear? Use a = 0.05. 

(e) If there are significant differences in the measure of abrasive 
wear due to the fluid brand, use a suitable multiple comparison 
method to determine which fluid brands differ. Use a = 0.01. 
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(f) Same as part (e) but for the type of cutting tool. 

4. During the manufacturing process of a certain component, its break- 
ing strength was measured for three operating temperatures and for 
each temperature there was influence of seven furnace pressures. The 
relevant data in certain standard units are given as follows. 


Pressure 
Temperature PI P2 P3 P4 P5 P6 P7 
T1 0.803 0.836 1.303 1.276 1.161 1.054 1.307 
T2 0.705 0.630 ~~ 1.005 1.062 0.616 0.803 0.618 
T3 1.321 0.815 0.771 1110 O.710 ~~ 1.022 0.717 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Do the operating temperatures have a significant effect on the 
breaking strength? Use a = 0.05. 

(d) Do the furnace pressures have a significant effect on the breaking 
strength? Use a = 0.05. 

(e) If there are significant differences in breaking strength due to 
temperatures, use a suitable multiple comparison method to de- 
termine which temperatures differ. Use a = 0.01. 

(f) Same as part (e) but for the furnace pressure. 

5. A university computing department manages four resource centers 
on the campus. Each center houses one timesharing terminal and 
two types of personal computers. During a given week, the numbers 
of hours a certain type of computing machine was being used were 
recorded and the relevant data are as follows. 


Resource Center 


Equipment 1 2 4 4 
Time-Sharing Terminal 70 70 50 60 
Apple Computer 40 40 20 £40 
IBM Computer 30 30 610 = 630 


(a) Describe the mathematical model you will employ to analyze 
the effects of resource center location and the type of computing 
machine. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform appropriate F tests to determine whether the two factors 
have main effects. Use a = 0.05. 

6. A factorial experiment was performed to study the effect of the level 
of pressure and the level of temperature on the compressive strength 
of thermoplastics. The relevant data in certain standard units are given 
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as follows. 
Temperature (°F) 
Pressure (Ib/in.2) 250 260 270 
120 8.00 10.57 8.30 
130 8.01 9.40 8.86 
140 7.72 10.30 8.32 
150 8.14 9.73 8.01 
(a) Describe the mathematical model and the assumptions for the 


(b) 
(c) 


(d) 


(e) 


(f) 
(g) 


experiment considering both factors to be fixed. 

Analyze the data and report the analysis of variance table. 
Does the level of temperature have a significant effect on the 
compressive strength? Use a = 0.05. 

Does the level of pressure have a significant effect on the com- 
pressive strength? Use a = 0.05. 

If there are significant differences in the compressive strength due 
to the temperature, use a suitable multiple comparison method 
to determine which temperatures differ. Use a = 0.01. 

Same as part (e) but for the pressure. 

Suppose the temperature and pressure levels are selected at ran- 
dom. Assuming that there is no interaction between pressure 
and temperature, state the analysis of variance model and report 
appropriate conclusions. How do your conclusions differ from 
those obtained in parts (c) and (d). 
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7. Astudy was conducted to determine the effect of the size of a group on 
the results of a brainstorming session. Three different types of com- 
pany executives were used, one for each group size. Each group was 
assigned a problem and was given an hour to generate ideas. The vari- 
able of interest was the number of new ideas proposed. The relevant 
data are given as follows. 


(a) 


(b) 
(c) 


Size of Group 
Type of Group 2 3 4 #5 
Sales executives 22 30 38 34 


Advertising executives 19 25 32 31 
Marketing executives 16 20 26 28 


Describe the mathematical model you will employ to analyze 
the effects of group size and the type of group on the number of 
new ideas being proposed. 

Perform appropriate F tests to determine whether the factors 
have any main effects. Use w = 0.05. 

If there are significant differences between the types of groups, 
use a suitable multiple comparison method to determine which 
group types differ. Use a = 0.01. 
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8. Consider a two-factor experiment designed to investigate the break- 
ing strength of a bond of pieces of material. There are 5 ingots of a 
composition material that are used with one of the three metals as the 
bonding agent. The data on amount of pressure required to break a 
bond from an ingot that uses one of the metals as the bonding agent 
are given as follows. 


Type of Metal 


Ingot Copper tron Nickel 


1 83.3 83.0 78.1 
2 77.5 79.9 78.6 
3 85.6 93.8 87.1 
4 78.4 89.2 83.8 
5 84.3 85.3 84.2 


(a) Describe the mathematical model and the assumptions for the 
experiment. It is assumed that metals are fixed and ingots are 
random. 

(b) Analyze the data and report the analysis of variance table. 

(c) Do the metals have a significant effect on the breaking strength? 


Use a = 0.05. 
(d) Do the ingots have a significant effect on the breaking strength? 
Use a = 0.05. 


(e) If there are significant differences in the breaking strength due to 
metals, use a suitable multiple comparison method to determine 
which metals differ. Use a = 0.01. 

(f) Obtain point and interval estimates of the variance components 
associated with the assumptions of the model given in part (a). 

9. Mosteller and Tukey (1977, p. 503) reported data from a study where 

six experimenters measured the specific heat of water at various tem- 

peratures. The interest lies in investigating the reliability of the mea- 

surements and in determining an accurate estimate of the specific heat. 

The data are given as follows. 


Temperature (°C) 


Investigator 5 10 15 20 25 30 

Liidin 1.0027 1.0010 1.0000 0.9994 0.9993 0.9996 
Dieterici 1.0050 1.0021 1.0000 0.9987 0.9983 0.9984 
Bonsfreld 1.0039 1.0016 1.0000 0.9991 0.9989 0.9990 
Ronland 1.0054 1.0019 1.0000 0.9979 0.9972 0.9969 
Bartollis 1.0041 1.0017 1.0000 0.9994 1.0000 1.0016 
Janke 1.0040 1.0016 1.0000 0.9991 0.9987 0.9988 


Source: Mosteller and Tukey (1977, p. 503). Used with permission. 


Two-Way Crossed Classification Without Interaction 


(a) 
(b) 
(c) 
(d) 
(€) 


(f) 
(g) 


Describe the mathematical model and the assumptions for the 
experiment. Would you use Model I, Model II, or Model IIT? 
Explain. 

Analyze the data and report the analysis of variance table. 
Does the level of the temperature have a significant effect on the 
measurement of specific heat. Use a = 0.05. 

Does the investigator have a significant effect on the measure- 
ment of specific heat. Use a = 0.05. 

If there are significant differences in specific heats due to temper- 
ature, use a Suitable multiple comparison method to determine 
which levels of the temperature differ. Use a = 0.01. 

Same as in part (e) but for the investigator. 

If you assumed the investigator to be a random factor, determine 
the point and interval estimates of the variance components of 
the model. 
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10. Weekes (1983, Table 1.1) reported data of Michelson and Morley on 
the speed of light. The data came from 5 experiments, each consisting 
of 20 consecutive runs. The results are given below where reported 
measurement is the speed of light in suitable units. 


Experiment 
Run 1 2 3 4 5 
1 850 960 880 890 890 
2 740 940 880 810 840 
3 900 960 880 810 780 
4 1070 940 860 820 810 
5 930 880 720 800 760 
6 850 800 720 770 810 
7 950 850 620 760 790 
8 980 880 860 740 810 
9 980 900 970 750 820 
10 880 840 950 760 850 
11 1000 830 880 910 870 
12 980 790 910 920 870 
13 930 810 850 890 810 
14 650 880 870 860 740 
15 760 880 840 880 810 
16 810 830 840 720 940 
17 1000 800 850 840 950 
18 1000 790 840 850 800 
19 960 760 840 850 810 
20 960 800 840 780 870 


Source: Weekes (1983, Table 1.1). Used with permission. 
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(a) Describe the mathematical model and the assumptions for the 
experiment. Would you use Model I, Model II, or Model III? 
Explain. 

(b) Analyze the data and report the analysis of variance table. 

(c) Does the experiment have a significant effect on the measurement 
of the speed of light? Use a = 0.05. 

(d) Does the run have a significant effect on the measurement of the 
speed of light? Use a = 0.05. 

(e) Assuming that the experiment and run effects are random, esti- 
mate the variance components of the model and determine their 
relative importance. 

11. Berry (1987) reported data from an experiment designed to investigate 
the performance of different types of electrodes. Five different types of 
electrodes were applied to the arms of 16 subjects and the readings for 
resistance were taken. The data are given below where the measures 
of resistance are given in the original units of kilohms. 


Electrode Type 


Subject 1 2 3 4 5 
1 500 400 98 200 250 
2 600 600 600 75 310 
3 250 370 220 250 220 
4 72 140 240 33 54 
5 135 300 450 430 70 
6 27 84 135 190 180 
7 100 50 82 73 78 
8 105 180 32 58 32 
9 90 180 220 34 64 

10 200 290 320 280 135 

11 15 45 75 88 80 

12 160 200 300 300 220 

13 250 400 50 50 92 

14 170 310 230 20 150 

15 66 1000 1050 280 220 

16 107 48 26 45 51 


Source: Berry (1987). Used with permission. 


(a) Describe the mathematical model and the assumptions for the 
experiment. Would you use Model I, Model II, or Model III? 
Explain. 

(b) Analyze the data and report the analysis of variance table. 

(c) Does the electrode type have a significant effect on the measures 
of resistance? Use aw = 0.05. 
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(d) Does the subject have a significant effect on the measures of 
resistance? Use a = 0.05. 

(e) If there are significant differences in resistance due to electrode 
type and it 1s considered to be a fixed effect, use a suitable multi- 
ple comparison method to determine which electrode types dif- 
fer. Use aw = 0.01. 

(f) If you assumed any of the effects to be random, determine the 
point and interval estimates of the variance components of the 
model. 

(g) It is found that the measures of resistance have a skewed dis- 
tribution to the right. Make a logarithmic transformation on the 
data and repeat the analyses carried out in parts (b) through (f). 


Two-Way Crossed 
Classification 
with Interaction 


4.0 PREVIEW 


Suppose that we relax the requirement of model (3.1.1) that there be exactly one 
observation in each of the a x b cells of the two-way layout. The model remains 
the same except that we could now use y;;, to designate the k-th observation 
at the i-th level of A and the j-th level of B, that is, in the (i, 7)-th cell. We 
now suppose that there are n(n > 1) observations in each cell. With n = 1, 
the model (3.1.1) will be a special case of the model being considered here. 
With an arbitrary integer value of n, the analysis of variance will be a simple 
extension of that described in Chapter 3. However, an important and somewhat 
restrictive implication of the simple additive model discussed in Chapter III is 
that the value of the difference between the mean responses at two levels of A 
is the same at each level of B. However, in many cases, this simple additive 
model may not be appropriate. The failure of the differences between the mean 
responses at the different levels of A to remain constant over the different levels 
of B is attributed to interaction between the two factors. Having more than one 
observation per cell allows a researcher to investigate the main effects of both 
factors and their interaction. In this chapter, we study the model involving two 
factors with interaction terms. 


4.1 MATHEMATICAL MODEL 


Consider two factors A and B having a and b levels, respectively, and let there 
be n observations at each combination of levels of A and B (..e., n observations 
in each of the a x b cells). Let y;;, be the k-th observation at the i-th level of 
A and the j-th level of B. The data involving a total of N = abn scores yj;x’s 
can then be schematically represented as in Table 4.1. 

The notations employed here are a straightforward extension of the notations 
for the preceding chapter. A dot in the subscript indicates aggregation or total 
over the variable represented by the index, and a dot and a bar represent the 
corresponding mean. Thus, the sum of the observations corresponding to the i-th 
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TABLE 4.1 
Data for a Two-Way Crossed Classification with n Replications per Cell 
Factor B 
B, Bo B; wae vee B; Lee Bp 
Ai Vill, VII2,--+> Yiln Y121> Y1225-+-s Yl2n Y131, Y1325 +++ Yi3n on Yijls Vij2.--+5Vijn vss Yibl> Ylb2> +++ Yibn 
A2 Y2115 Y2125--+s Y2In 221; Y222;-++s Y22n Y231 Y232,+--+s Y23n vse Y2j1, Y2j2.--+s Y2jn ute Y2b1, Y2b2; +--+» Y2bn 
A3 Y311; Y312,--+5 Y3ln 321, Y322, +--+ Y32n Y331; 332, +++ Y33n ue y3j1; Y3j2.+-+s Y3jn uct Y3b1; Y3b2,--+s Y3bn 
Factor A 
Aj Yills Yil2.-++s Yiln Yi21, Vi22,-++s Yi2n Vi31s Yi32s ++ Vian ves Yijls Vij2s-++s Yijn vce Yibl, Vib2s +++ Yibn 
Aa Valls Yal2,+++s Yaln Ya21, Ya22,+++> Ya2n Ya3ls Ya32.+++> Ya3n vt Yajls Yaj2s+-++s Yajn mts Yabls Yab2; +--+» Yabn 


8ZL 


aoueHeA jo sisAjeuy aut 
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level of factor A and the j-th level of factor B is 


Vij. = > Yijk 
k=1 


The corresponding mean is 


> Yijk 
l 


— _ ij. k= 
ij. = —_ = —, 
n n 


The total of all the observations for the i-th level of factor A is 
b n 
Yi. = > > Vijk> 
j=l k=1 


and the corresponding mean is 


Similarly, for the j-th level of factor B, the sum of all the observations and 
the corresponding mean are denoted by 


and 


Finally, the sum of all the observations, the grand total, is 
a b n 
Y= > >> >> Vijk> 


i=] j=l k=1 


and the grand or overall mean is 
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The analysis of variance model for this type of experimental layout is given 


as 
i=1,2,...,a 

Vijk = M+; + Bi + (AB) +e YJ =1,2,...,0 (4.1.1) 
k=1,2,...,n, 


where —0o < pt < 00 Is the overall mean, a; is the effect due to the i-th level 
of factor A, B; is the effect due to the j-th level of factor B, (@B);; 1s the 
interaction effect representing the departure of the mean of the observations in 
the (i, j)-th cell, denoted by j1;;, from the sum of the first three terms of (4.1.1), 
and é;;, is the customary error term accounting for the random variation from 
cell to cell. The terms a; and 6; are known as main effects. They are average 
effects corresponding to each level of factor A and each level of factor B. The 
term (@);; is called an interaction effect. If the levels of factor A and factor B 
behave in a strictly additive manner, that is, if a level of factor A contributes 
a certain amount to the average yield, irrespective of the level of factor B, we 
say that the (@f);;’s are all zero. On the other hand, if a level of factor A, say, 
3, increases yield more with, say, level | than with level 2 of factor B, we say 
that (@B)3; is positive and (@B)32 1s negative. The model (4.1.1) states that an 
observation y,;;, consists of these components: 


(1) the overall mean jp, 
(ii) the main effect a; for factor A at the i-th level, 
(ii1) the main effect B; for factor B at the j-th level, 
(iv) the interaction effect (@B);; when factor A is at the i-th level and factor 
B is at the j-th level, and 
(v) an error or residual term e;;, which is the deviation of a particular 
observation from the cell mean /;;. 


Remark: Two-way crossed classifications are widely used in many areas of scientific 
research and applications. Some examples are as follows: 


(a) A city has a sources of a pollutant, n samples are taken from each source, and 
are sent to b laboratories for analysis of its chemical composition. 

(b) An organism has a species, n females are taken from each species, and each 
one is used in b experiments in order to measure variation in progeny. 

(c) An industrial production involves a machines, b workers, and n samples of the 
product are taken from each machine x worker combination. 


4.2 ASSUMPTIONS OF THE MODEL 


As in the case of model (3.1.1), the assumptions of the model (4.1.1) can be 
summarized as follows: 


(i) The errors e;;,’s are randomly distributed with mean zero and common 
variance o7. 
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(ii) The errors associated with any pair of observations are uncorrelated. 
(iii) Under Model I, the @;’s, B;’s, and (a@B);;’s are assumed to be fixed 
constants subject to the constraints 


a 


a b b 
> % = 9B) = DB); = DB); = 0. 
i=1 j=l j=l 


i=] 


(iv) Under Model II, the w;’s, B;’s, and (wf); ;’s are assumed to be randomly 


2 2 


distributed with zero means and variances o;, ore and Cup respec- 


tively. Furthermore, the a;’s, B;’s, and (@B);;’s are mutually and com- 
pletely uncorrelated. In this case, o2, Op» Oop and 0? are the variance 
components of the model (4.1.1) and the inferences are sought about 
them. 

(v) Although the distribution properties for Models I and II were fairly 
straightforward to enumerate, this is not the case for Model III. Gen- 
erally, if any element in an interaction term is considered random, it 
may be appropriate to assume that the interaction term has a Model II 
effect. However, in that case, we must assume the corresponding distri- 
bution properties of the interaction terms. The proper error term used 
to test certain hypotheses and to construct certain confidence intervals 


will depend on the distribution properties being assumed. 


There are several types of distribution properties that have been proposed as 
realistic for various experimental situations and their full discussion is beyond 
the scope of this volume. The interested reader is referred to the works of Wilk 
(1955), Wilk and Kemthorne (1955, 1956), Scheffé (1956a,b), and Harville 
(1978). The articles by Hocking (1973) and Sahai (1988) provide an excel- 
lent summary of various mixed models. We assume the following distribution 
properties and an analysis of variance for this situation is presented in succeed- 
ing sections. If an experimenter desires distribution properties other than those 
given here, tests and confidence intervals must be modified accordingly. One 
such alternate mixed model is discussed briefly in Section 4.19. 

We assume that @;’s are constants subject to the constraint )-7_, a; = 0; B;’s 
are uncorrelated random variables with mean zero and variance Op: (af); ;'s are 
random variables with mean zero and variance [(a — 1)/ aloig and subject to the 
constraints }°;_,(@B);; = 0 for all j.2 This introduces dependence between 


' Some statisticians have expressed concern over the fact that the interaction terms (a8); ;’s are 
assumed to be uncorrelated to those of a;’s and B;'’s. However, the assumption is consistent 
with the results from the finite population models that define the interaction to be a function of 
the main effects (see, e.g., Cornfield and Tukey (1956); Scheffé (1959, Section 7.4)). 

2 Since the a;’s are assumed to be fixed subject to the constraint that }~"_, a; = 0, it is felt that it 
is reasonable to assume that the summation of the interaction terms, over all the levels of factor 
A within a given level of factor B, should be equal to zero. 
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certain interaction terms at different levels of the fixed factor. In fact, one can 
show that (Graybill (1961, pp. 396-397))° 


I 2 . “fy 
El(aB);; (@B)ryl=4 a | a (4.2.1) 
0, pai pei'. 


Note that in this model the variance of (@B);; 1s defined as [(a — 1)/a)oZ, 
instead of O58 in order to simplify the expressions for expected mean squares. 
Furthermore, the 6;’s, (@B);;’s, and e;;,’s are uncorrelated with each other, and 
€;jx’S have mean zero and variance oa. 

The objective in this model is to test ‘the hypotheses: oz = =0,02 op = 9, Mi Sa = 0; 
and to find point and interval estimates for the variance components 07, a7 " ; OR: ; 
fixed effects a;’s and their contrasts. 


Remarks: (i) Under Model I, the assumptions that }*"_,(@B);; = 0 = jai (aB)i; 
can be made without any loss of generality because, by definition, (a@B);; 1s a differential 
effect and if the sum, say, over i, were equal to some non-zero constant c, we could 
replace 6; with 6; — c/a and the resultant (@f);; will then sum to zero. Similarly, the 
assumptions that )-_, a; = 0 = ~~ j<1 Bj can be made without any loss of generality 
(see Remark at the end of Section 2.2). 

(ii) As indicated in the Remark of Section 3.2, under the assumptions of the random 
effects model in (4.1.1), there are three intraclass correlations defined as follows: The 
correlation between the observations within the same level of factor A; within the 
same level of factor B; and between the observations within the same cell. Denoting 
these correlations by Pr Pg, and Pap, we have Pa. = = 0, 1 Ce + Cup + O5 + 62), pp = 

o3/(o; +05, +0 +o), and Pop = Onp/(o; + Og, +05 +6; 2) Under the assumptions 
of the mixed model in (4.1.1), the correlation between the observations within the same 
cell is given by pug = Onp/(O; + Oxp + Op): 


4.3. PARTITION OF THE TOTAL SUM OF SQUARES 


To partition the total sum of squares, we start with the identity 


ijk — ¥.. = Vi. — VDA OG. — YD + Oj. — Yi. — 9G. FY) + Oijk — ij), 


and square and sum over i,j, and k to yield: 


b 
\- You _ y.. y 
k= 


i=1 j=l 


a 


3 This implies that the interaction terms are correlated with a covariance equal to —(1/ a)o? op and 
that the covariance decreases as the number of levels of the factor A increases. 
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a b 


=o rw, —VI+FO.- 9.) (4.3.1) 


i=] j=l k= 
+ (Vij. — Vi. — VG +S: D+ Oye Sud 


-~y (Vi.. — 9. +e do. 


i=] j=l k=1 i=] j=l k= 


a b n a b n 


+) (Vij. — Fi. — HG FLY +>" (vijk — Vix.) 


i=] j=] k=1 i=] j=l k=] 


= bn (yi. — Y.. y +an Y0,- y..) 
i=] j= 


nD d Bij. — He — IAT + 


=| j= i=1 j 


Q 
> 
Q 


b n 
(ijk — Vij), 


1 k=1 
(4.3.2) 


where 


ij. = Yi. = 
) n- bn 
a n a b n 
y > Yijk ) } Yijk 
_ i=l k=1 _ i=] j=1 k=1 
yj. = ———_.,, and jy. = 
an abn 


The identity (4.3.1) is valid since all cross-product terms are equal to zero. 

The terms to the right of (4.3.1) are denoted by SS,4, SSg, SSaz, and SSz 
in respective order and measure the variation due to the a@;’s, B;’s, (@B)j;;’s, 
and e;;,’s, respectively. The identity states that we have partitioned the total 
variation SS; into the following components: 


(i) SS,4, called the A-factor sum of squares, representing the variation in 

the y;;, due to the A-factor effects; 

(ii) SSzg, called the B-factor sum of squares, representing variation in the 
yijk due to the B-factor effects; 

(iii) SS,g, called the interaction sum of squares, representing variation in 
the y,;, due to the interaction effects; and 

(iv) SSz, called the error sum of squares, representing variation in the yj jx 
after removing A-factor effects, B-factor effects, and interaction effects. 


Remark: In a two-way crossed classification, one can compute ab separate variances 
corresponding to each cell as )-)_, (yijx—Jij.)”, which can then be tested for homogeneity 
of variances (see Section 2.22). 
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4.4 MEAN SQUARES AND THEIR EXPECTATIONS 


Mean squares are obtained in the usual way by dividing the sums of squares by 
their corresponding degrees of freedom. For SS,4 and SSzg terms, we have the 
condition that 


b 
Dd Gi. — 5.) = DG. - 5.) = 0 (4.4.1) 


and so the degrees of freedom are a — 1 and b — 1, respectively. For the SS,4z 
term, note that the random quantities 6;;’s defined by 


03; = Vij. — Vi. — Yj 4+-Y.z. 


are subject to the conditions: 


a 
) > 6;; =0, for each j (b relations), (4.4.2) 
i=l 
b 
6;; = 0, foreach i (a relations). (4.4.3) 
j=l 


However, in effect, there are only a + b — 1 independent restrictions on the 
6;;’s. This is so since b restrictions (4.4.2), when summed, determine the 
relation 


b a 
j=1 \i=1 
Similarly, restrictions (4.4.3), when summed, must also be zero; that 1s, 
a b 
i=l \j=1 


Thus, only a — 1 of the a relations (4.4.3) will be independent and the total 
number of independent restrictions on 6;;’s is a+b — 1. Since the number 
of 6;;’s is ab, the number of degrees of freedom associated with the SSaz 
term 1S 


ab —(a+b—1)=(a-—-1)(b—1). 


By subtraction, the number of degrees of freedom associated with the error 
term SSp is 


abn — (a —1)—(b— 1) —(a—1)(b—1) =ab(n— 1). 
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The corresponding mean squares MS,, MSz, MS,az, and MS, are, therefore, 
defined by 


SS SS 
MS,=——", MSs=7—, 
SS 4B SSE 
MS, = —— 8 MSp = — oe. 
B= Gr iybop 2 MSe= ap 


Next, we examine the expectations of mean squares, which can be readily 
derived from the assumptions of the model (4.1.1) and the usual laws of ex- 
pectations. On taking successive averages of the model equation (4.1.1), we 
obtain 


Vij, = U+ a; + Bj + (@B)ij + 4i;., (4.4.4) 

y. =wt+a; +h + (Bp); +, (4.4.5) 

yj =uU+a,+ Bj +B); +2é;, (4.4.6) 
and 

y..=eta.+B+(@p).+2é.., (4.4.7) 


where, as usual, the bars indicate means over the subscripts shown by dots. 
Substituting the values of yijx, Vij., Vi.., ¥.j., and y.. given by (4.1.1), (4.4.4), 
(4.4.5), (4.4.6), and (4.4.7), respectively, into the expressions for the SS4, SSz, 
SSap, and SS¢- terms defined in (4.3.1), we obtain the following expressions: 


SS = ye 7 Yeu éi;.)’, (4.4.8) 


i=] j=l k= 


SSaz = nD leh - (wB);. — (@B).j + (@B). 


i=l j= 
+6 —@,,-@; +2)’, (4.4.9) 


b 
SSg =an) [B; — B + (@B).j; —(@B). +2; -2@.7, (4.4.10) 


j=l 


and 


SS4 = bn ) “lo; — & + (@B):. — @B). + &.. — 2). (4.4.11) 
i=] 
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Now, because the e;;,’s are uncorrelated and identically distributed with mean 
zero and variance o?, it follows that 


E(e,) =o; (4.4.12) 

E(é;.) =o; /n, (4.4.13) 

E(é ) = 0; /bn, (4.4.14) 

E(é,) =o; /an, (4.4.15) 
and 

E(& ) =o} /abn. (4.4.16) 


It is then a matter of straightforward computation to derive the expectations of 
mean squares. First, taking the expectation of (4.4.8), we obtain 


b n 


E(SSz)= >) 9) Y> Eleije — 21.” 


i=1 j=l k=1 


= ab(n — 1)o?. (4.4.17) 


The expectation of MSz 1s, therefore, given by 


_p(_S8e_\_.2 
E(MS.E) = &(=- = >) =O). (4.4.18) 


Note that the result (4.4.18) is true under the assumptions of the fixed and 
random, as well as mixed effects models. 

Now, to derive the expectations of MS,g, MSz, and MS,a, we consider the 
cases of Models I, II, and III separately. 


MODEL | (FIXED EFFECTS) 


Under Model I, the a;'S, Bj’s, and (af); ;’s are fixed quantities with the restric- 
tions that & = B = (aB); = (a@B). j = (af). = 0. Therefore, the expressions 
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(4.4.9) through (4.4.11) reduce to 


a b 
SSasn =n >) [(oB)ij +2. —&.- 2). 42.7, (4.4.19) 
i=] j=1 
b 
SSg =an) [Bj +2). -@.), (4.4.20) 
j=l 
and 
SS, = bn) lo; +4. — 2). (4.4.21) 
i=1 


On taking the expectation of (4.4.19), we obtain 


E(SSaz) =n S 3 E((@B)ij + @j. — 2. — 2). +2@..P 


i=l j=1 


=n > De; +n ; > E(@ij,.—&.—@;.+2.)°, (4.4.22) 


i=1 j= 


since the (a@B);;’s are constants and the expectation of the cross-product term is 
zero. By proceeding as in the derivation of the result (3.4.11), it can be readily 
shown that 


a b 2 
Yl Ey. —%.. — 2. +2. =@-Db-Y)=. (4.4.23) 
; ; n 
i=1 j=1 


Then, on substituting (4.4.23) into (4.4.22), we obtain 
a b 
E(SSas)=n )_ > “(oB);, + (a — 1b — 10? (4.4.24) 
i=1 j=l 
Therefore, the expectation of MS,z is given by 


a b 


n> ep) 


_ SSB 2 i=l j=l 
E(MS ap) =E ($4 — Ib ~ =} = 0, + (a—-1(b—-1) 7 1b _ 1) . (4.4.25) 
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To find the expectation of MSz, we note from (4.4.20) that 
b b 
E(SSz) = an ps B: +E KG — “| ; (4.4.26) 
jJ=1 jJ=1 


since §;’s are constants and the expectation of the cross-product term is zero. 
Now, using the results (4.4.15) and (4.4.16), we find that 


J=!1 j=1 
2 o2 
— p—*& — p—£ 
an abn 
b-1 , 
= O;. (4.4.27) 
an 


Then, on substituting (4.4.27) into (4.4.26), we obtain 
b 
E(SSg) = an YB; + (b= lop. (4.4.28) 


j=l 


Therefore, the expectation of MSz 1s given by 


SSz > an Qo, 
E(MSzg) = E = —— - 4.4.29 
(MSz) (—) vet 5 LB (4.4.29) 
Finally, from symmetry, it follows that the expectation of MS, 1s given by 


bn <= 
E(MS,) = 02 + — dw (4.4.30) 


MODEL II (RANDOM EFFECTS) 


Under Model II, the a;’s, B;’s, and (a@B);;’s are mutually and completely un- 
correlated random variables with mean zero and variances o2 and o2 re- 
spectively. It then follows, using the formulae for the variances oF the sampling 


distribution of the means of a@;’s, B;’s, and (wB);;’s, that 


E(a7) =o; (4.4.31) 
E(@’) =o, /a, (4.4.32) 
E(B;) = 9, (4.4.33) 
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E(B’) =o} /b, (4.4.34) 
E[(oB);,| = o58, (4.4.35) 
E[(aB); | = 02/0, (4.4.36) 
E[(@B)’,] = o3,/a, (4.4.37) 
and 
E((a@B)?] = ogg /ab. (4.4.38) 


First, taking the expectation of (4.4.9), we obtain 
a b — __ __ 
E(SSaz) =n Y) > — El(@eB)ij — (@B)i. — (@B).j + (@B).P 
i=1 j=l 
+n 2 > E(é;,—@,. —@;.+é@.], (4.4.39) 


since the expectation of the cross-product term is zero. By proceeding as in the 
derivation of the result (3.4.11), it can be shown that 


a b 


S> >> El(@B)i; — @B):. — @B).j + @B).1 = (a -— DG - Nojg (4.4.40) 


i=1 j=l 


and 
a b 2 


Elé; —&. —€;. +é.2 =(a—-Ib—-1)-. (4.4.41) 
" . n 
i=l j=!i 


On substituting (4.4.40) and (4.4.41) into (4.4.39), we obtain 
E(SSag) = (a — 1)(b — 1)[o; + nog]. (4.4.42) 
Therefore, the expectation of MS, 1s given by 


SSB 


EMS ap) = ez — 1b — 1) 


= a? + NO gp: (4.4.43) 
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Next, taking the expectation of (4.4.10), we obtain 
b ; bo _ 
E(SSg) = an \ (8; — BY + EY (@B).; — @B).” 
jJ=1 jJ=!1 
b 
+E KG jn :* (4.4.44) 
j=1 


since again the expectations of the cross-product terms are zero. Using the 
results (4.4.33) and (4.4.34), it follows that 


b 
E) (8; — BY =(b — l)o5. (4.4.45) 
j=1 


J 


Similarly, using (4.4.37) and (4.4.38), we have 


b 2 
Ey (GB), — @B).P = - 1); (4.4.46) 
j=l 
and, finally, using (4.4.15) and (4.4.16), we have 
5 os oe 
EE) @;.-@. =- IN. (4.4.47) 


Substituting (4.4.45), (4.4.46), and (4.4.47) into (4.4.44), we obtain 
E(SSg) = (b — l)[o7 + nog, + anos]. (4.4.48) 
The expectation of MSz is, therefore, given by 


SSp 
b—1] 


E(MSzg) = E( ) =oo+ No gg + anop. (4.4.49) 


Finally, from symmetry, it follows that 


E(MSq) = 0; + noj, + bnoj. (4.4.50) 


MODEL III (MIXED EFFECTS) 


Under Model II, the «;’s are constants with the restriction that }-_, a; = 0; B;’s 
are uncorrelated random variables with mean zero and variance op, and (@“B);;’s 
are random variables with mean zero and variance-covariance structure given 
by (4.2.1), and subject to the constraints )“"_, (wB);; =0 for all j. Furthermore, 
the B;’s, (wB);;’s, and e;;,’s are uncorrelated with each other. It then follows, 
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using the formulae for the variances of the sampling distributions of the means 
of the B;’s and (@B);;’s that 


E(B;) = 95, (4.4.51) 
E(B’) = of /b, (4.4.52) 
2 a—-l1, 
E|(aB);;] = ——op: (4.4.53) 
and 
E[(@B);] = — ‘2 (4.4.54) 
I. ab ap oe 
Now, under the restrictions that @ = (aB). j= (aB). = (, the expressions 


(4.4.9) through (4.4.11) reduce to 


a b 
SSag =n 2 Y @B)ii — (a@B); +2. -—@.-2@; +2)’, (4.4.55) 


i=) j=l 


b 
SSg =an) [pj —B +é;.-2.7, (4.4.56) 
j=l 
and 
SS4 = bn) [oj + @B)i. + &.. — 21. (4.4.57) 


i=] 


First, taking the expectation of (4.4.55), we obtain 


a b a b 
E(SSas)=n >> )> El(@B):; -@Bi.P +n >_> Eley —%..-2,. +22, 
i=] j=] i=1 j=1 


(4.4.58) 


since the expectation of the cross-product term is zero. Using the results (4.4.53) 
and (4.4.54), we obtain 


ab a b 
> > El@B); — @B.P =D bs E(aB);; — od 


i=1 j=l i=1 | j=! 


— ] — ] 
i=] a ab 


= (a — 1)(b — l)ogg. (4.4.59) 
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Furthermore, as shown in (3.4.11), we have 


2 


a b og 
DD Ey. — 2%. = 25, +8.Y = (a 1b 1)-*. 


i=1 j=1 
Substituting (4.4.59) and (4.4.60) into (4.4.58), we obtain 
E(SSag) = (a — 1)(b — 1) [of + noZe]- 
Therefore, the expectation of MS, z is given by 


SSAB 


EMS ab) = Q(z —~1(b— 1) 


) = a? + NO gp. 


Next, taking the expectation of (4.4.56), we have 


b 


J 


b 
E(SSp) = an E YB; —_ By +E KG _ a . 
j==1 j=1 


jJ= 


As in (4.4.45) and (4.4.47), it is easy to verify that 


b 
E) (8B, - BY =(b- 105 


j=l 


and 


Substituting (4.4.64) and (4.4.65) into (4.4.63), we obtain 
E(SSz) = (b—- 1) [o? + ano; |. 
So that the expectation of MSz is given by 


SSp 
b-—1 


E(MSp) = E( ) = 0; +ano5. 


Finally, taking the expectation of (4.4.57), we obtain 


E(SS4) = bn bs a +E) GB) +E &. - “I 
i=] i=] i=] 


(4.4.60) 


(4.4.61) 


(4.4.62) 


(4.4.63) 


(4.4.64) 


(4.4.65) 


(4.4.66) 


(4.4.67) 


(4.4.68) 
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since the @;’s are constants and the expectations of the cross-product terms are 
zero. From (4.4.14), (4.4.16), and (4.4.54), it readily follows that 


E wi _* , ‘02, (4.4.69) 
and 
E . —~é ~=(a-1 oe (4.4.70) 
i=l bn 


Substituting (4.4.69) and (4.4.70) into (4.4.68), we obtain 


E(SSq) = (a — lo? + (a — I)noz, + bn Ya. 


i=] 


Hence, the expectation of MS, is given by 


E(MS 4) = E(- — .) = 0, + NOg,g + a1 


The foregoing results of Sections 4.3 and 4.4 can now be summarized in a 
tabular form as the analysis of variance table shown in Table 4.2. 


4.5 SAMPLING DISTRIBUTION OF MEAN SQUARES 


In this section, we give the distribution results on mean squares for the fixed, 
random, and mixed effects models. The derivation of these results 1s beyond 
the scope of this volume and can be found in Scheffé (1959, pp. 109-112), 
Graybill (1961, pp. 397-402; 1976, pp. 630-632), and Searle et al. (1992, pp. 
131-132). Note that although in the derivation of the expected mean squares 
we have not made any distribution assumption about the form of the random 
components of the model (4.1.1), we do require the assumption of normality to 
derive their sampling distributions. 


MODEL | (FIXED EFFECTS) 
Under the distribution assumptions of Model I, it can be shown that: 


(a) The quantities MSz, MS4zg, MSz, and MS, are statistically indepen- 
dent. 
(b) The following results are true: 


(1) 


MSz _ x*[ab(n — 1)] 
o2 ab(n — 1) 


e 


. (4.5.1) 
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(ii) 
MS "(a — 1) (b—1),A 
AB X [(a — 1) ( ) AB] (4.5.2) 
o? (a — 1)(b—1) 
(iii) 
MSg  x7[b—1,Az] 
re oe) og 4.5.3 
a2 b—1 
and 
(iv) 
MS la —1,4 
MPa XBT Aad (4.5.4) 
0; a—1l1 


where, as usual, x*[-] denotes a central and x? [- ,‘] denotes a non- 


central chi-square variable with respective degrees of freedom and the 
noncentrality parameters 448, Ag, and A, defined by 


b 
an 
Ap=—) 8’, 
Io? J 
e J=1 
and 
bn < 
A= — > a. 
202 , 
e i=] 


It, therefore, follows from (4.5.2) through (4.5.4) that 
(ii)’ If (@B);; = 0, for all i and 7, then 


MSas  x2lla—1)@-1)] 
~ ; 4.5.5 
a (@-lh@-1) Go) 
(ii)’ If B; =, for all j, then 
MS, x2[b-1] 
~ ——_; 4.5.6 
a2 b-—1 ( ) 
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(iv) If a; =0, for all 7, then 


MS, x7*[a — 1] 


2 _ 
0; a—1 


(4.5.7) 


MODEL II (RANDOM EFFECTS) 
Under the distribution assumptions of Model II, it can be shown that: 


(a) The quantities MS-, MS,g, MSz, and MS, are statistically indepen- 


dent. 
(b) The following results are true: 
(1) 
MS *Tab(n —1 
EK lab(n— DI (4.5.8) 
o2 ab(n — 1) 
(11) 
MS *\(a—1)(b-1 
AB _ xl — 1)¢ Me (4.5.9) 
Of; + no %4 (a —1)(b—-1) 
(iii) 
MS *Ibh—1 
8 _. ~ xe 7 (4.5.10) 
0; +nogg + ano, b—] 
and 
(iv) 
MS *la—1 
"4 ~ xa (4.5.11) 
o2+ NO ig + bno~ a—1 


That is, the ratio of MS; to a2 isa x7[ab(n — 1)] variable divided by ab(n — 1); 
the ratio of MS,z to 02 + NO ip is a x7[(a — 1)(b — 1)] variable divided by 
(a — 1)(b — 1); the ratio of MSz to 02 + Nop + ano; is a x°[b — 1] variable 
divided by b — 1; and the ratio of MS, to 07 + no2, + bnoz, isa x*[a — 1] 
variable divided by a — 1. 


MODEL III (MIXED EFFECTS) 


Under the distribution assumptions of Model III, it can be shown that: 
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(a) The quantities MS-, MSaz, MSz, and MS, are statistically indepen- 


dent. 
(b) The following results are true: 
(1) 
MS- — x’[ab(n — 1)] 
oa? ab(n~—1) 
(11) 
MSag x*I(a- Db 1) 
a2 +noig (a—1)(b—1) ° 
(111) 
MS=z3 x*[b- 1] 
o2 + ano%, b—-1 ’ 
and 
(iv) 
MS, x7[a —1,A,4] 
a a a 
o2 + NO op a—1 
where 
b a 
dA ” a? 


It then follows from (4.5.15) that if vw; = O for all 7, then 
(iv)’ 


MS, x*[a—1] 
a2 +noig a-1 


4.6 TESTS OF HYPOTHESES: THE ANALYSIS 
OF VARIANCE F TESTS 


(4.5.12) 


(4.5.13) 


(4.5.14) 


(4.5.15) 


(4.5.16) 


In this section, we present the usual hypotheses of interest and appropriate F 
tests for fixed, random, and mixed effects models. As usual, the test statistic 
is constructed by comparing two mean squares that have the same expectation 
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under Hp and the numerator mean square has a larger expectation than the 
denominator mean square under A}. 


MODEL | (FIXED EFFECTS) 


The usual tests of hypotheses of interest are about AB interactions, factor B 
effects, and factor A effects. 


Test for AB Interactions 
Ordinarily the two-way classification study begins with a test to determine 


whether the two factors interact. The hypothesis 1s 


Hy’? : all (@B);;’s = 0 
versus (4.6.1) 
H;'? : not all (@B);;’s are zero. 


In order to develop a test procedure for the hypothesis (4.6.1), we note from 
(4.4.18) and (4.4.25) that under H;'”, 


E(MSz) = o;, 
E(MSag) = 0; 


and under H/'?, 
E(MS,,3) > E(MSe). 
Furthermore, it follows from (4.5.1) and (4.5.5) that under HH}? ; 


_ MS as /o2 _ MS 4B 


Fap = MS; /o2 = MS; ~ F{(a — 1)(b — 1), ab(n — 1)}. (4.6.2) 


Thus, the statistic (4.6.2) provides a suitable test procedure for (4.6.1); Hg? 
being rejected if 


Fag > Fl(a — 1)\(b— 1), abv — 1);1 -— a]. 


Test for Factor B Effects 
The hypothesis is 


Hy :all 6; =0 
versus (4.6.3) 
H® : not all B;’s are zero. 
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In order to develop a test procedure for the hypothesis (4.6.3), we note from 
(4.4.18) and (4.4.29) that under H2, 


E(MSz) = 0, 
E(MSz) = 02; 


and under H,°, 
E(MSz) > E(MSez). 
Furthermore, it follows from (4.5.1) and (4.5.6) that under H?, 


_ MSzg/o; | MSz 
~ MSg/o2 MSe 


Fp ~ F[b—1,ab(n — 1)]. (4.6.4) 


Thus, the statistic (4.6.4) provides a suitable test procedure for (4.6.3); He 
being rejected if 
Fz > F[b—- 1,ab(n — 1);1-—a]. 


Test for Factor A Effects 
The hypothesis is 


Hp‘ : all a; = 0 
versus (4.6.5) 
H} : not all @;’s are zero. 


Proceeding as in the test for factor B effects, it readily follows that the statistic 


_ MSa/o? | MS, 
7 MS; /o2 7 MSe 


Fa ~ Fla —1,ab(n — 1)] 


provides a suitable test procedure for (4.6.5); H¢* being rejected if 
Fy, > Fla—l,ab(n—1);1—-—a]. 


Remarks: (i) If a nonsignificant value of F4g occurs, some authors suggest that the 
MS az and MS; terms of the analysis of variance Table 4.2 be pooled to obtain a better 
estimate of the error term, namely, 


(a _— 1l)(b- 1)MSapz +ab(n —_ 1)MS-e _ SSaB + SS- 
(a —1)(b—1) + ab(n—1) ~ abn-—a-~b+1- 
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The reason put forth is that when no interactions exist, E(MSag) = a2 gives the same 
expectation as for MS,; so that the new estimator of a2 would have a large number of 
degrees of freedom associated with it. However, this practice is not always recommended 
since a nonsignificant F value does not mean that the hypothesis is true. In other words, 
there is always the possibility of some interaction being present and not showing up 
in the F test. Hence, the best estimate of co? is always to be taken as MSz, unless 
the experimenter has additional information confirming the nonexistence of interaction 
terms. Moreover, the pooling procedure affects both the level of significance and the 
power of the tests for factor A and factor B effects, in ways that are not yet fully explored. 
It is generally recommended that the pooling should not be undertaken unless: 


(a) the degrees of freedom associated with MSz¢ are too small; and 
(b) the calculated value of the test statistic MS,g/MSg falls well below the critical 
value. Some authors recommend that MS,z should be nearly equal to MS_. 


Part (a) of this rule is intended to limit pooling to cases where the gains may indeed 
be important, and part (b) is meant to ensure that in fact there are no interactions. For 
some general rules of thumb for deciding when to pool see Paull (1950), Bozivich et al. 
(1956), Srivastava and Bozivich (1962), and Mead et al. (1975). 

(ii) It may be of interest to derive the significance level associated with the experiment 
as a whole. Let a, a2, and a3 be the significance levels of the F ratios F4, Fg, and Faz, 
respectively, and let a be the overall significance comprising all three tests. Then it can 
be shown that (Kimball (1951)) a < 1 — (1 —@,)(1 — a@2)(1 — a3). For example, if a, = 
a = a; = 0.05, thena < 1—(1 —0.05)° = 0.143. Similarly, if a, = a, = a; = 0.01, 
thena < 0.030. 


MODEL II (RANDOM EFFECTS) 


In Model II, as in Model I, the usual hypotheses of interest are about AB 
interaction, factor B effects, and factor A effects. 


Test for AB Interactions 
The presence for interaction terms is tested by the hypothesis 


Hj? O28 = 0 
versus (4.6.6) 


To obtain an appropriate test procedure for the hypothesis (4.6.6), we note from 
(4.4.18) and (4.4.43) that under H}'?, 


E(MSz) = 0}, 
E(MSaz) = 0; 


and under H}*?, 
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Furthermore, it follows from (4.5.8) and (4.5.9) that under HH}? 


MS 42 /o2 _ MS 43 


MS;/o2 — MS, ~ *'@~ DO~ Dean DI. (4.6.7) 


Fag = 
Thus, the statistic (4.6.7) provides a suitable test procedure for (4.6.6); Hg'? 
being rejected if 

Fup > F{(a — 1)(b — 1), ab(n — 1); 1 -— a]. 


Test for Factor B Effects 
The presence for factor B effects is tested by the hypothesis 


He OB = 0 
versus (4.6.8) 
HP OB > 0. 


Again, we note from (4.4.43) and (4.4.49) that under H2, 
E(MS apr) = oa? + NO gg, 
E(MS3) = oa? + NO gg; 

and under H;?, 
E(MSz) > E(MSaz). 


Furthermore, it follows from (4.5.9) and (4.5.10) that under H?, 


MSz /(o2 + NO zg) _ MSp 


= MSaz/(o2 +no2,) = MSap ~ F[b —1,(a— 1)(b — 1)]. (4.6.9) 


B 
Thus, the statistic (4.6.9) provides a suitable test procedure for (4.6.8); H} 
being rejected if 

Fp > F[b—1,(a —1)(b—1);1-a]. 


Test for Factor A Effects 
The presence for factor A effects is tested by. the hypothesis 


Hj :02 =0 
versus (4.6.10) 
H} :o2 > 0. 
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Proceeding as in the test for factor B effects, it readily follows that the statistic 


MS,/(o2+nogg) MS, 
F, = ———_——_———- = ——_ ~ Fla —-l, —1\(b-1 4.6.11 
‘= MSen/(o2 tno) ~ MSan la—1,(a—1)(b-1)] (4.6.11) 


provides a suitable test procedure for (4.6.10); H¢' being rejected if 


Fa > Fla—1,(a—1)(6—1);1-a]. 


Remarks: (i) The more general hypotheses of interest may be 


Cup O5 a2 
y) <= Pi, 2 5 < pf, 2 2 P3 
O; Oo; +no Oo + Noy, 


which are tested in the obvious way. 

(ii) One of the most important differences between the tests of hypotheses in Models 
I and II being that when a factor has a random effect, the main effect is tested by using 
the interaction mean square in the denominator, whereas if it has a fixed effect then one 
must divide it by the error mean square. 


MODEL III (MIXED EFFECTS) 


Under Model III, the hypotheses of interest are: Oop = OQ, oF = 0, anda;’s = 0. 
The appropriate tests are obtained in the same way as in the case of Models I 
and II. 


Test for AB Interactions 

The hypothesis Hj'" :oZ, = 0 versus H;*® : 07, > 0 may be tested by the ratio 
MS,4e/MSze, which under Hy? has an F distribution with (a — 1)(b — 1) and 
ab(n — 1) degrees of freedom. Similarly, the hypothesis Ces /o2 < p, can be 
tested in the obvious way by using the statistic (1 + Cup /o2)'(MS AB/MSe). 


Test for Factor B Effects 

The hypothesis Hj’ :0; = 0 versus H/’ :o% > 0 may be tested by the ratio 
MS2/MSz, which under HY has an F distribution with b — 1 and ab(n — 1) 
degrees of freedom. Similarly, the hypothesis OF /o2 < p2 may be tested in the 
obvious way by the test statistic (1 + ano; /o2)-'(MS B/MSe). 


Test for Factor A Effects 

The hypothesis H¢' : all @; = 0 versus H}' : not all a; = 0 may be tested by the 
ratio MS, /MS,z, which under H¢' has an F distribution with a — 1 and (a — 1) 
(b — 1) degrees of freedom. 
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Remarks: (i) The tests for the main effects under Model III work conversely to the 
tests under Models I and II. Here, the test statistic for factor B with random effects is 
obtained by dividing MSz by MSz, whereas if this factor had occurred under Model II, 
then the statistic would be obtained by dividing MSzg by MS,z. Similarly, the factor A 
with fixed effects is tested by dividing MS, by MSaz, whereas if the factor had occurred 
under Model I, then the test would have been made by dividing MS, by MSgz. These 
results on tests of hypotheses were first developed by Johnson (1948). 

(ii) If interaction terms are nonsignificant, one may want to test the hypothesis 
HA : all a;’s = Oby the statistic MS 4 /MS¢ which may provide more degrees of freedom 
for the denominator. The possibility of pooling MS,, and MS; may also be considered 
if degrees of freedom are few. The earlier comments on pooling also apply here. 


SUMMARY OF MODELS AND TESTS 


The appropriate test statistics for Models I, II, and III developed in this section 
are summarized in Table 4.3. 


TABLE 4.3 

Test Statistics for Models I, 11, and III 

Hypothesized Model I Model II Model III 
Effect (Aand B Fixed) (Aand BRandom) (A Fixed, B Random) 
Factor A MS,/MSe MSa4/MSae MS, /MSaes 
Factor B MSB/MSe MSB/MSas MS3/MSe 
Interaction AB MSaBp/MSeE MSase/MSeE MSap/MSe 


4.7 POINT ESTIMATION 


In this section, we present results on point estimation for parameters of interest 
under fixed, random, and mixed effects models. 


MODEL | (FIXED EFFECTS) 


The least squares estimators* of the parameters jz, a;’s, B;’s, and (@B);;’s for 
the model (4.1.1) are obtained by minimizing 


a b n 
O=S°>> > big — we — 8 — Bj — BY, (4.7.1) 


i=1 j=l k= 


4 The least squares estimators in this case are the same as those obtained by the maximum 
likelihood method under the assumption of normality. 
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with respect to 4, a;, B;, and (a@B);;; and subject to the restrictions: 


a b a b 
doa: = > Bi = > @B)ii = D_@B)ij = 0. (4.7.2) 
i=] j=l i=] j=l 


When one performs this minimization, it can be shown by the method of ele- 
mentary calculus that the following least squares estimators are obtained: 


f=... (4.7.3) 
a= 7,.-y., i=1,2,...,a, (4.7.4) 
Bb; =5;.-5... F=1,2,...,b, (4.7.5) 


and 


These are the so-called best linear unbiased estimators (BLUE). The variances 
of estimators (4.7.3) through (4.7.6) are: 


Var(ji) = 0, /abn, (4.7.7) 
Var(a@;) = (a — 1)o; /abn, (4.7.8) 
Var(B;) = (b — 1)02 /abn, (4.7.9) 
and 
Var((aB);;) = (a — 1)(b — 1)02 /abn. (4.7.10) 


The other parameters of interest may include 4+ a; (mean levels of factor A), 
4+ B; (mean levels of factor B), +a; + 8B; +(a@B);; (the cell means), pairwise 
differences a; — a;, 8B; — Bj, and the contrasts 


a a b 
Det (sr4=0). | 


b 
j=l = 


= 0) and 
J 


S77 6 /(aB), (Sy = o) 


i=1 j=l i=1 j=l 


1 


Their respective estimates together with variances are given by 


ee 


+a; = ji. Var( 1 4 a) = 02 /bn: (4.7.11) 


e+ Bi = Jj, Var(u+ Bi) =02/an; (4.7.12) 
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boy FB) (a@B);; = Vij. 


Var (1 +o; +B; + (eB); =o02/n:; (4.7.13) 
QQ; - ay = Vj. — Vi'_s Var (a —_ a) = 20; /bn; (4.7.14) 
Bj - By =Fj.-57,  Var(B; — B;') = 20? /an; (4.7.15) 


= (> a) o;/bn; (4.7.16) 
b , 
= ( #) ofan; (4.7.17) 


j=l j=) j=l j=l 
and 
a b a 
£;;(oB)i; = 4460.4 Vi. — Ij +I)3 (4.7.18) 
i=] j=l i=] j= 
a a a b 1 <2 ; 1 b ; ; 
va( 3° GAs) = (LY G- FL e-Lye) azn 
i=] j=1 i=l j=1 =] j=l 


(4.7.19) 


The best quadratic unbiased estimator of o? is, of course, provided by the error 
mean square. 


MODEL II (RANDOM EFFECTS) 


In the case of the random effects model, for random factors (that have significant 
effects), one would often like to estimate the magnitude of the variance com- 
ponents. As before, unbiased point estimators can be readily obtained by using 
linear combinations of the expected mean squares in the analysis of variance 
Table 4.2. For instance, 02 can be estimated by noting that 


E(MS,) — E(MSaz) = bnoZ. 
Hence, an unbiased estimator of o2 is given by 


>» MS,—MS,p 


G2 A 4.7.20 
ee = (4.7.20) 
Similarly, 
MSs — MS 
63 = 5 as (4.7.21) 
an 
MS,s — MS 
62,2 — (4.7.22) 
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and 


6? =MSz. (4.7.23) 


As usual, the parameter pz is, of course, estimated by f = y... The estimators 
(4.7.20) through (4.7.23) are the so-called minimum variance quadratic unbi- 
ased estimators or the minimum variance unbiased estimators under the assump- 
tion of normality. They may, however, produce negative estimates.° If one can 
assume the lack of interaction terms, it is possible to use MS; in place of MS,p 
in the estimators (4.7.20) and (4.7.21). Alternatively, one can pool MS, and 
MS, and use this pooled estimator for MS 4, in the expressions for 02 and Op. 


MODEL III (MIXED EFFECTS) 


In the case of a mixed effects model with A fixed and B random, the param- 


eters to be estimated are f, a;’s, 0f,03,, and o. From the expected mean 


squares column under Model III of Table 4.2, the o7, Cig and o? are estimated 
unbiasedly by 


MS; — MS 

63 = —OB UT. (4.7.24) 

an 

MSap — MS 

2, =—“__—_, (4.7.25) 

n 

and 

G? = MSe. (4.7.26) 


Thus, althuogh Oxp and o? have the same estimates as in the case of the random 


effects model, the estimate of Op is different. This general approach of estimat- 
ing variance components can be used in any mixed model. After deleting the 
mean squares containing fixed factors, the remaining set of equations can be 
solved for variance components.® 

The fixed effects jz, 4 + a;, and @;’s are, of course, estimated by 


f=... 
oo 


+a; = yj., i= 1, 2,...,4a, 


5 For a discussion of the nonnegative maximum likelihood estimation, see Herbach (1959) and 
Miller (1977). 

6 For a discussion of the maximum likelihood estimation in the mixed model, see Szatrowski and 
Miller (1980). 
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which are the same estimates as in the case of a fixed effects model. Furthermore, 
comparisons involving pairwise contrasts can be estimated by 


ee 
ji — Ay = Vi.. — Ji'..- 


To evaluate the variances of the means and contrasts, we note that 


—] 
oa? +n ie 62, + ano, 
— a 
Var(y...) = 
abn 
—] 
a? +n @ 62, + nog 
Var(y;.) = G (4.7.27) 
bn 
—] 
a2 +n te 62, + nos 
— a 
Var( yj.) = ————_ 
n 
o2 
Cov(5j... Fv.) = ——e (4.7.28) 
ab 
and 
2(o7 + no? 
Var(¥i.. — Wv.) = (7. ™ a) (4.7.29) 


4.8 INTERVAL ESTIMATION 


In this section, we present results on confidence intervals for parameters of 
interest under fixed, random, and mixed effects models. 


MODEL I (FIXED EFFECTS) 


An exact confidence interval for o? can be based on the chi-square distribution 
of ab(n — 1)MS_/o2. Thus, a 1 — & level confidence interval for a? is 


ab(n — 1)MSe 2 ab(n — 1)MSe 


Gaba —),1—-a/2]) ~~ yabw apy =? 


Furthermore, it is possible to obtain confidence intervals based on the ¢ distri- 
bution for a particular @; or a particular difference a; — a;. For example, 


(¥;.. — y..) —t[ab(n — 1), 1 — a@/2)/(a — 1)MS_e/abn 
< a < (3. —y.) + tlab(n — 1), 1 —@/2],/(a — 1)MSg/abn, (4.8.2) 


defines a 1 — @ level confidence interval for @;. 
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Similarly, to obtain confidence limits for a; — a@;, we note from (4.7.14) 
that 


E(¥j.. — Vir.) = Oj — ag 
and 


Var(¥i.. — Yi.) = 20; /bn. 
The confidence limits can, therefore, be derived from the relation: 


(Vi. — Yi.) — (Qi — a7’) 


V/ 2MS_/bn 


Similar results on confidence intervals for the 6;’s, (@B);;’s, and any pair- 
wise differences on them, using the ¢ distribution, can also be obtained. How- 
ever, multiple comparison methods, discussed in Section 4.12, are usually 
preferable. 


~ tlab(n — 1)]. (4.8.3) 


MODEL II (RANDOM EFFECTS) 


Anexact confidence interval for the error variance component @? is obtained as 
in (4.8.1). However, as indicated in Section 2.10 for the one-way random model, 
exact confidence intervals for the variance components O5p» Op and o2 do not 
exist. One can, nevertheless, obtain exact confidence intervals for 07 + NOgs» 
oa; +noj, + bnoj, 0; +nojg + anog; and ratios of particular combinations of 
variance components, for example, 04/0703 /(07 +nogg), 04 / (0; +nojg), 
and (a; + noj,)/(o; + nojg + bnoz), by taking the appropriate ratios of 
mean squares as discussed in Section 2.10. For a discussion of approximate 
confidence intervals for the variance components Oep» Op o2; for the ratios of 
variance components 03/07, 0;/0;,0,/03; and the proportions of variability 
a7 /(o2 + O%p + oF + a), o5g/(o2 + Cis + o% + a2), a3 /(a; + Ces + oF + a), 
and of /(a; + og +03 +0), including numerical examples, see Burdick and 
Graybill (1992, pp. 121-124). 


To obtain confidence limits for 4, we note from (4.4.7) that 


E(y...) = ph (4.8.4) 
and 


a7 + NO dp + ano; + bno? 


Var(¥...) = (4.8.5) 


abn 


In order to get a mean square with expected value equal to the numerator in 
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(4.8.5), we use the linear combination MS, + MSz — MSaz since it has the 
expected value given by 


E(MS,4 + MSzg — MSag) = 0; + noj, + anog + bnoj. (4.8.6) 
It can now be shown using the Satterthwaite procedure (see Appendix K) that 


y... — pb 


V (MS, + MS, — MSap)/abn 


~ approx. t[v], (4.8.7) 


where 


= [MS4 + MSz — MSas]’ 
(MS,)?>— (MSz)’ (MSapy 
a-1' b-1l' @-1b—-]) 


(4.8.8) 


The confidence limits for jz can now be determined in the usual way, but these 
will be imprecise because of the approximation involved in (4.8.7). 


MODEL III (MIXED EFFECTS) 


An exact confidence interval for oa? is constructed as in (4.8.1). However, as 
in Model II, exact confidence intervals for 07 and Oc, do not exist. One can, 
nevertheless, obtain exact intervals for o/,/07 and 0/07 by basing the proce- 
dure on the statistics MS4zg/MS- and MSz/MSz, respectively. Approximate 
confidence intervals for Op and Oop can be constructed by the method of Sat- 
terthwaite and other related procedures (see, e.g., Burdick and Graybill, 1992, 
p. 153)). Also, as in the case of Model I, it is possible to obtain confidence 
intervals for j2, a, & + aj, &; — a, or the contrast )V_, £:0; ()_, 4: = 0). 
For example, an exact confidence interval for ran £;a;, with coefficient 1 —a, 
is given by 


S- ei51.. — t[(@ — 1b — 1), 1 —@/2] |MSan >> e7/bn < D0 bai 
i=] i=] i=] 


a 


< > 45... + t[(a@— @ —1),1—@/2] |MSaz > e7/bn. (4.8.9) 


. 


i=l i=] 


Thus, when dealing with a mixed model, the appropriate mean square to 
be used in the estimated variance formula is no longer MS<_. A simple rule to 
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determine the appropriate mean square is: use the mean square employed in 
the denominator of the test statistic for testing the presence of the fixed factor 
under consideration. For instance, with the mixed model (4.1.1) where A 1s 
fixed and B random, MSaz is the appropriate mean square (see Table 4.3). 
The degrees of freedom in constructing the confidence interval are those asso- 
ciated with the mean square utilized for estimating the variance of the contrast. 
However, it is not always possible to obtain an appropriate mean square for 
the desired variance estimate. For example, in order to estimate 2 + a@;, we 
notice from (4.7.27) and Table 4.2 that there is no appropriate mean square to 
estimate the desired variance. An unbiased estimate of Var(y,..), however, can 
be obtained by using an appropriate linear combination of the mean squares. 
An approximate 1 — @ level confidence interval for 4 + a; can be constructed 
using 


yi. Etlv, 1 —a/2] 


where the degrees of freedom v will be estimated using Satterthwaite procedure 
similar to equation (4.8.8). 


4.9 COMPUTATIONAL FORMULAE AND PROCEDURE 


As in Section 3.9, we use the following computational formulae, which are 
identical to the definitional formulae given in (4.3.1): 


a boon 2 
Sr = ddd Min =, 


i=) j=l k= 
a y? 
SSa=—) yo -—, 
A n d» abn 
1 y? 
SS3 = — —— 
Ban 24: abn 
l a b l a l b y? 
SSagp = 2 a a po 
AB n dy dM bn an Yi. an Dv abn 
and 
b n 1 a b 


SSe=)U, Yim ~~ DD Yip 
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where as before a dot in the subscript indicates the total over the variable 
represented by the index. Ordinarily, the interaction sum of squares is obtained 
by the relation 


SSasp = SS7 — SS4 — SSz — SS_E 
Or 


SSap = SSrc — SS, — SSz, 


where 


The computational procedure for the sums of squares can thus be performed 
in a systematic manner by the following sequence of steps: 


(1) Compute the cell totals: y11., yi2.,.--, Yap.- 
(11) Compute the row totals: y).., y2..,---5 Ya..- 
(111) Compute the column totals: yj, y.2.,..-, Y.o.- 


(iv) Compute the overall or grand total: 

a b ab 
y= yo yi. = yoy = Yo > vii. 
i=1 j=l 


i=1 j=l 


(v) Compute the raw sum of squares: 


a 


b 

2 2 2 2 
» Vijk = Yin + Yi2 + +++ + Vaon- 
i=1 j=l k=] 


(vi) Compute the correction factor: 


(vi1) Compute 
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(viii) Compute 


(ix) Compute 


1 a b 5 
~ Lie 
MZ) j=! 
(x) Compute 
SSr = Vig — 
i=l j=l k=1 abn 
(x1) Compute 
l a b y? 
SStc = - — 
ren di abn 


(x11) Compute 


(xii) Compute 


I y 
SS; = — 24, 
5 an dy abn 


j= 


(xiv) Compute SSap = SSr — SS, — SSz —SSe = SSrc — SS, — SSz. 
(xv) Compute SS- = SS7 — SSrc. 


4.10 ANALYSIS OF VARIANCE WITH UNEQUAL SAMPLE 
SIZES PER CELL 


In the two-way classification model discussed in this chapter, it has been as- 
sumed that there are the same number of replications of the experiment in each 
cell of the two-way Table 4.1. If this is not true, it is not possible to partition 
the overall sum of squares into independent components due to main effects 
and interaction terms. Care should always be taken to ensure that the number 
of observations in each cell is constant; but, even with the utmost care, it may 
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happen for a variety of reasons, such as loss of subjects, incomplete records, 
and the like, that an experiment terminates with unequal sample sizes per cell. 
Moreover, unequal sample sizes are fairly common with many survey-type data. 
For example, an agricultural analyst may wish to study the effect of tempera- 
ture and precipitation on production of certain agricultural crops from data for 
certain counties in the country. In this type of uncontrolled study, it may easily 
happen that the number of counties in the various temperature-precipitation 
categories are not equal. 

The model in this case remains the same, except that the sample size cor- 
responding to the i-th level of factor A and the j-th level of factor B is now 
denoted by n;;. The data layout for a general two-way crossed classification 
with unequal subclass numbers is displayed in Table 4.4. Now, the total number 
of observations for the i-th level of factor A is 


for the j-th level of factor B is 


b 
nj ) Nij» 


i=l 
and the total number of observations is 


a b 


a b 
N= yoni, = yon; = Yoyo ni. 
jJ=1 


i=] i=l j=l 


When only a few values are missing, one could replace a missing value by 
the mean for that cell. The standard analysis of variance can then be performed 
except that for each missing value being estimated, the error degrees of free- 
dom are reduced by one. If there is a wide disparity between the numbers of 
observations in different cells, one can no longer use the standard analysis of 
variance described earlier in this chapter and must resort to some other proce- 
dures. When the sample sizes for each cell are unequal, the two-way analysis of 
variance for factor effects becomes complex. The component sums of squares in 
the analysis of variance are no longer orthogonal; that is, they do not sum to the 
total sum of squares. The least squares method for obtaining the best estimates 
of the parameters is rather complicated in the fixed effects model and the best 
analysis has not been and probably will not be found for the random effects 
models. In the following, we consider some common methods of analysis of 
variance for unequal sample size data. For a concise and readable account of 
the nonorthogonal two-way analysis of variance, see Herr and Gaebelin (1978). 


WUqng 6+ ++ 67TqDK 61 qNK see Mulng oes ‘7fog 6, {0K tae LPUENK 6+ + + 6TEDK § 1 EDK CPUTDK 6+ 6 © 67ZOK ZK IPulDK 6+ ++ $ZIOK 119K ey 


The Analysis of Variance 


Muque 6s ++ §TqIK S1QIK ae Mule e+ 6elig 61 LIK Lee Hug iK 6+ ++ §TEIK S1EIK CUTIK 6+ + © §UTIK §1TIK Vupig 699 OTe STK ly 
: y A0poei 


HEUGEK ©+ ++ *ZUEK 1GEK wwe ULE Oe *OLEK LEK ww. HEME EK 8+ HZEEK EEK «= EUZEK & + +» *7ZEK “IZEK uleg «+++ ‘Z1EK ‘IlEK fy 
UGC K S++ §ZITK NITE we RUE EET LTR RUE Oe HZETK TETK —- LUA ++ + 777K 1A ulZK sss 717K IT ty 
MuqiK es = + *7QIK TK wee TUT Oe  EOETK TLTK ke. i, Ca A lO AF, OC A OAT, MurtC s+ STK TK ly 
tae f tae 
1g ‘d &q cq ‘g 
g A0joR4 


[J2D 4ad suoinesiday “u yyM uoHedIpIsse]> passosy APMA-OML B 105 PIE 
vb d1aVL 
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Further details on nonorthogonal two-way analysis of variance models can be 
found in a special issue of Communications in Statistics: Part A, Theory and 
Methods (Vol. 9, No. 2, 1980). 


FIXED EFFECTS ANALYSIS 


We discuss two methods for fixed effects analysis; one for the case of propor- 
tional frequencies and the other for the general case of unequal frequencies. 


Proportional Frequencies 
Sometimes, the unequal sample sizes follow a proportional pattern;’ that 1s, 


Nj Nn, j 


N (4.10.1) 


nij= 
The relation (4.10.1) implies that the sample sizes 1n any of the rows or columns 
are proportional. This is called the case of proportional frequencies. 
When the frequencies are proportional, the analysis of variance discussed 
earlier in this chapter can be employed with suitable modifications. For example, 
the definitional and computational formulae for the sums of squares are: 


Nij b Nnij 2 
y.. 
ss ES Pow 3 E ESO 
i=] j=l k= i=1 j=l k=1 
a a 2 2 
_ yi y 
SSa = Dn; —y y= —- — =, 
i=l jay i N 
b b y?, y? 
SSp = nj. — 3.2 = I, 
j=l j=l | 
a b 
SS4aB = > Y> nij 5ip.. — ¥.-9y. +5.) (4.10.2) 
i=l j=l 
fal jet Mig jay Mf GN 


and 


a b Nij 


i=1 j=1 k=] 


T It is not necessary to check that the number of replicates nj; in each of the ab cells follows the 
relation (4.10.1). Only one need check one cell in each of a — 1 levels of factor A and one in 
each of b — 1 levels of factor B (Huck and Layne (1974)). 


216 The Analysis of Variance 


TABLE 4.5 
Analysis of Variance for the Unbalanced Fixed Effects Model in 
(4.1.1) with Proportional Frequencies 


Source of Degreesof Sumof Mean Expected 
Variation Freedom Squares Square Mean Square 
2 . 
Due to A a—1 SSA MSa Oo, + aol dni 
2 ly 2 
Due to B b-1 SSp_- MSg_—ég ace 
ot + 
Interaction (a—1)b-1)  SSaz_ MSas ee + 7 EH 7 = SY mstoot 
AxB i=l j 
Error N—ab SSE MSe a2 
Total N-1 SSr 
where 
nij 
Vij = \_ vile Vij. = Vij. /Nij> 
k=1 
b 
Yi. = 3 Vij. Yi. = Vi. /Ni., 
j=l 


Vj. = Yo ij. yj =yj/nj, 


and 


b 
y= > > do vies 9. = y./N- 


The analysis of variance including the expected mean squares is given in 
Table 4.5. Tests of hypotheses for main effects and interaction can be carried out 
as before for the case of an equal number of observations per cell. For example, 
for testing the interaction effects, the statistic is MS4g/MS_. Under the null 
hypothesis Hy‘? : (aB),;; = 0 for all i and j, this ratio has an F distribution 
with (a — 1)(b — 1) and N — ab degrees of freedom; and the null hypothesis 
is rejected for large values of this ratio. If there are no significant interaction 
effects, the main effect due to factor A is tested by the statistic MS4/MSz 
which, under the null hypothesis Hy :a@; = 0 for all i, has an F distribution 
with a — 1 and N — ab degrees of freedom. Similarly, the main effect due to 
factor B is tested by the statistic MSg/MSz which, under the null hypothesis 


Two-Way Crossed Classification with Interaction 217 


H’ : B; = 0 for all j, has an F distribution with b — 1 and N — ab degrees of 
freedom. 


General Case of Unequal Frequencies 

If the sample sizes n;;’s do not vary considerably, say, by not more than the 
ratio of 2 to 1, with most n;; being nearly equal and no nj; equal to zero, 
an approximate analysis of variance suggested by Yates (1934), called the 
method of unweighted means, may be used. This approximate method is also 
used in cases where the n;;’s do differ considerably but the researcher de- 
sires to obtain a quick initial approximation to a more exact analysis. Note 
that since Var(¥;;.) = o2/nij, the variances are unequal when nj; is not 
constant. 

The procedure is rather a simple one where an analysis of variance is per- 
formed using the jj,;.’s as if there were only one observation for each (i, j)-th 
cell. The sums of squares for the main effects, interaction, and error are calcu- 
lated in the usual way. Thus, defining xj; = yj;., the expressions for the sums 
of squares are: 


a b 
SS4Bu = POC: — Xj. — xj +x), (4.10.3) 

i=] j=] 
and 

a b Nij 

SSz = > (viik — Vij), 

i=1 j=l k=1 
where 

b a 

Xi, = > xi; /b, xj = Y- xij/a, 

j=! i=] 

and 


oI 
lI 
Me 

Kas 
aa 
™— 

Q 

> 


The analysis of variance is shown in Table 4.6 and the approximate F tests 
are performed based on the usual ratios of mean squares. It should, however, 
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TABLE 4.6 

Unweighted Means Analysis for the 
Unbalanced Fixed Effects Model in (4.1.1) 
with Disproportional Frequencies 


Source of Degrees of Sum of Mean 
Variation Freedom Squares Square 
Due to A a—1 SS Au MS an 
Due to B b-1 SS Bu MS Bu 
Interaction Ax B (a—1)(b-1) SSABu MS aBu 
Error N —ab SSE MSeE 
Total N-1 SS7T 


be noted that the sums of squares SS4,,, SSg,, and SS,4g, are computed on a 
“mean” basis, whereas the SS is computed on an “individual” basis. Thus, 
MSze = SSz/(N — ab) is not the correct term with which to test for the main 
effects and interaction mean squares. It must be modified and expressed on 
a ‘“‘mean’’ basis to be comparable to the mean squares for main effects and 
interaction. 

The expected values of the mean squares are obtained as follows (see, e.g., 
Searle (1971b, pp. 365-—366)): 


b< — — 
E(MS au) = —— ) loi + @B)i, — & — (@B).Y +," 0 
i=] 


b 
E(MSpu) = = 2h; + @B); —B.—G@B).P+njz'o2, (4.104) 


E(MSapu) = 7 = ae Leu (wB);, — (@B).j + (@B).P 
+n,'o2, 
and 
E(MSz) = 07, 
where 


~1 
] a b 4 
ny = (3. } "i . (4.10.5) 


i=l j= 


Note that n;, represents the harmonic mean of all a x b n;;’s. The following 
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features of the preceing analysis are worth noting: 


(1) The means of the x;;’s are calculated in the usual manner; that is, x; = 
jet x;;/b, and so on. 

(ii) The error sum of squares SS¢ is calculated exactly as in the case of 
proportional frequencies. 

(111) The sums of squares do not add up to the total sum of squares. The 
first three sums of squares (i. e., SS4,,SSzg,, and SS,48,) add up to 
ran Ya ij.) but all four do not add up to the total sum of 
squares. 

(iv) The sums of squares SS4,, SSg,, and SS4 3, do not have chi-square type 
distributions as in the case of model (4.1.1), nor they are independent. 

(v) The sum of squares SS; is independent of SS,,, SSg,, and SS,4z,, and 
SSe/o2 has an exact chi-square distribution. All other sums of squares 
(divided by a2) have only approximate noncentral (or central under Ho) 
chi-square distributions. 


Since the mean squares in Table 4.6 do not have exact chi-square distributions, 
their ratios do not provide exact F statistics for testing hypotheses of interest. 
However, Gosslee and Lucas (1965) indicated that they provide reasonably 
adequate F statistics using modified degrees of freedom for the numerator 
mean squares. For example, the modified numerator degrees of freedom for 
MSa,,/MSz 1s 


} 


2 
(a—1) (» | 
va = 7 Cae 
(> n) +ala—2))°h; 
i=] i=] 


(4.10.6) 


where 


Similarly, Rankin (1974) has shown that the approximate F tests give satis- 
factory results provided the ratios of sample sizes do not exceed 3. He also 
investigated the problem of modifying the numerator degrees of freedom to ad- 
just for irregularities in sample sizes. Note that although the amended degrees 
of freedom (4.10.6) modify MS,,/MSz_ to be an approximate F statistic, we 
observe from (4.10.4) that the hypothesis it tests is the equality of a; + (@B),. 
for all i. Furthermore, since the observation x;; = y;;, has variance o7/nj;;, the 
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average variance of all the “observations” is 
a b op a b - o? 
—y e[nij)=— > dong = 
ab Nh 


bz J=!1 i=l] j=l 


and the estimated average variance of the “observations” 1s 


where n,, is the harmonic mean of the n;;’s defined in (4.10.5). 

An alternative to the unweighted-means analysis is the weighted-squares-of- 
means analysis also proposed by Yates (1934). In this technique, the interaction 
and error sums of squares are defined as earlier, but SS,4, and SS,, sums of 
squares are weighted in inverse proportion to their variances according to the 
number of observations in the cell. Thus, letting x;; = y;;., the corresponding 
weighted sums of squares are: 


SSaw = > wai(X;, — Xa)? 


and 


where 


b? - i=1 
WaAi b 1 > A a > 
— ) Wai 
j=l Nij i=l 
b 
y wai. j 
az _ j= 
WBj — a ] > XB = b b) 
a > 8) 
ia ij ’ 
= j=l 
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TABLE 4.7 

Weighted Means Analysis for the Unbalanced 
Fixed Effects Model in (4.1.1) with 
Disproportional Frequencies 


Source of Degrees of Sum of Mean 
Variation Freedom Squares Square 
Due to A a—1 SSaw MSaw 
Due to B b-1 SSBw MSBw 
Interaction A x B (a — 1)(b— 1) SSABw MS aBw 
Error N —ab SSE MSe 
Total N-—1 SSr 


and x;,,x,;, and x. are defined as earlier in the case of unweighted-means 
analysis. The interaction and error sums of squares are of course given as in 
unweighted-means analysis; that is, 


SSaBw = >> Yi — x, -x%;+x,) 


and 


Q 
= 
at 


SSz = > 2 (vijk — Vij)’ 


i=1 j=l k=! 


The complete analysis of variance is shown in Table 4.7 where expected 
values of the mean squares are obtained as follows (see, e.g., Searle (1971b, 
pp. 369-371)): 


; 2 
ao. 3 wai(a; + (@B);.) 
E(MSaw) = ~T1 > wai | &; + (@B);. — a +0, 
a 2 WAi 
i=l 
, 2 
, >> waj(B; + (aB) ;) 
] — j=l 2 
E(MSzy) = b-1 - WBj B; + (aB).; rs + O,, 
j=l 


_ b 
rz, 
j=l 
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a b 
(aR). — (wR). & (wh) PP 
E(MSasw) = 7a pn D2 2 {By (@B);. — (@B).; + (@B).] 
+n, o2, 
and 
E(MS,) = o?. 


Itcan be shown that the variance ratios MS 4y/MS¢, MSgy/MSze, and MSapw / 
ny MSz= provide exact tests of overall hypotheses concerning a;’s, B;’s, and 
(a@B);;’s. When the data are balanced, the null hypotheses being tested are: 


H¢' :alla; =0, Ho’ :allB; =0, Hp'* :all (@B),; = 0. (4.10.7) 


However, when data are unbalanced, the corresponding null hypotheses being 
tested are: 


Hé : all a; + (@B);, areequal, H2 : all B; + (@B).; are equal, 
Hg! :all (wB)i; — (@B);. — (@B).j + (@B).. are equal. (4.10.8) 


Thus, only with the restrictions 
(vB); =0, i=1,2,...,a; (@B);=0, j =1,2,...,b 


the null hypotheses (4.10.7) and (4.10.8) are equivalent. 

Federer and Zelen (1966) present another approximate analysis that is more 
exact, but somewhat more complicated than the unweighted and weighted anal- 
yses discussed here. Still, another approximate method called the method of 
expected subclass numbers can be found in Bancroft (1968, pp. 37— 41). In situ- 
ations when the approximate methods are not applicable, for example, badly 
balanced designs (with 10 or more observations in some cells and only a few 
in others) and designs with empty cells, a method based on multiple regression 
analysis may be used. The method consists of considering the analysis of vari- 
ance model as a regression model, fitting the model for the data, obtaining sums 
of squares for main effects and interactions as the regression sums of squares, 
and using the general inferential techniques for the regression model. There are 
various methods for carrying out this analysis and different methods may lead to 
differentresults. For example, one has to determine whether one will test the SS, 
adjusted for {(@B);;} and {B;} or only for {6 ;} if {(@B),;’s} are not significant, or 
test the unadjusted SS, against SS. Furthermore, the sums of squares are no 
longer orthogonal and the sequence in which hypotheses involving fixed effects 
{a;}, {B;}, {(@B);;} are tested may lead to different results. In addition, a unique 
partition of sums of squares does not exist and the hypotheses being tested do 
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not always correspond to the case of balanced design. For additional details re- 
garding this approach see Draper and Smith (1981) and Searle (1971b, 1987). 


RANDOM EFFECTS ANALYSIS 


The problems of testing hypotheses and estimation of variance components en- 
countered in unbalanced designs of random effects models having two or more 
factors are much more complicated than the corresponding balanced case. We 
again consider two cases for the random effects analysis, one for the case of pro- 
portional frequencies and the other for the general case of unequal frequencies. 


Proportional Frequencies 

For the case of proportional frequencies, the expected values of mean squares 
can be obtained by the Wilk and Kempthorne (1955) formula. For example, 
letting nj; = (nj.n_;)/N, we obtain (see, e.g., Snedecor and Cochran (1967, pp. 
478—483)): 


N ane 
E(MS,) = 02 + (:- “) 


a-—l1 


E(MSag) = 02 + —V_ ] >. (1 - 
SAB) = Fe + Ded 4 x - 5) 


and 
E(MSz) = o?2. 


Approximate tests of hypotheses and variance components estimates can be 
constructed as earlier. For detailed discussion and numerical examples, see 
Bancroft (1968, Section 1.6). 


General Case of Unequal Frequencies 
For the general case of unequal frequencies, Hirotsu (1968) proposed approxi- 
mate F tests for testing the hypotheses: 


a a 


Hs -o7 =O versus H} -o7 > 0, 
Hy :O% =(Q versus Hy? Of > 0, (4.10.9) 
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and 
AB. 2 _ AB, 2 
Hy” : 04g, =9 versus Hy" soy, > 0, 


by using the test statistics analogous to those in the balanced case where now the 
mean squares are those obtained in the unweighted-means analysis discussed 
earlier in this section. Thus, the proposed test statistics are: 


MSau/MSazsu, for He; 
MSzu/MSazgu, for Hy’; (4.10.10) 


and 
MSagu/n,' MSe, for Ho”; 


where 


, =| 
l1< 4 
m= (3 i | 


i=l j=1 


The test statistics (4.10.10) are to be compared with the 100(1 —@)th percentage 
points of the F distribution with the degrees of freedom [(a— 1), (a—1)(b—1)], 
[(b — 1), (a — 1)(b — 1)], and [(a — 1)(b — 1), N — ab], respectively. 


Remark: Hirotsu (1968) gave the expressions for the power functions of the tests 
(4.10.10) with numerical examples, which, however, tend to be very complex. Spjotvoll 
(1968) and Thomsen (1975) proposed exact tests for main effects variance components 
under the assumption that the interaction variance component is zero. Khuri and Littel 
(1987) developed exact tests of variance components that do not require the assump- 
tion of nonexistence of interaction variance component. Hussein and Milliken (1978a) 
considered tests for main effects variance components in a heteroscedastic situation 
assuming that the interaction variance component is zero. Similarly, Tan et al. (1988) 
reported tests for main effects as well as interaction variance components involving a 
heteroscedastic model. 


For the estimation of variance components, three methods of estimation 
were initially proposed and studied in some detail by Henderson (1953). The 
methods were reexamined and represented in elegant matrix notations by Searle 
(1968). Since then a variety of new procedures have been developed and the 
theory has been extended in a number of different directions. Rao (1971, 
1972) introduced the concept of minimum norm quadratic unbiased estima- 
tion (MINQUE). Similarly, LaMotte (1973) considered minimum variance 
quadratic unbiased estimators (MIVQUE) and Pukelsheim (1981) investigated 
the existence of nonnegative unbiased estimators. For detailed discussions of 
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these and other developments in the field the reader is referred to Searle et al. 
(1992) and Rao (1997). 

To illustrate the nature of the problem of estimating variance components for 
the case of unbalanced cell frequencies, consider an experiment for which the 
following two-way additive model is appropriate: 


U = 1, 2, ,a 
Yijk = M+; + Bj + eijR J=1,2,...,b (4.10.11) 
k =0, 1, »Nij, 


where the a;’s, B;’s, and e;;,’s are uncorrelated random variables with mean 
zero and variances of, 03, and 0, respectively. Let the total sum of squares be 
partitioned as follows: 


a 


b Nj 2 y? b y?. y? 
SY om 3 = [zee] 4 fz | 
i=] j=! Nj. 


or 
SS7r = SS, +882 +SSe, 
where 
SSe = SSr — SS, — SSz. 
Note that it is possible that SS~ may be negative. The derivation of expected 
values of the mean squares is complicated and the results may be shown to be 


those given in Table 4.8 (see, e.g., Graybill (1961, pp. 360—362)), where the 
coefficients of variance components are determined as follows: 


and 


Wrasbri i=l imi jai "J 
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TABLE 4.8 

Analysis of Variance for Model (4.10.11) 

Source of Degrees of Sum of Mean Expected 
Variation Freedom Squares Square Mean Square 
Due to A a—1 SSA MSA of + cabog + CaaFq 
Due to B b-1 SSB MSp oa? + ChbOp + Chad? 
Error N-a-—b+1 SSe MSe 07 +: Cepog + Cea%y 


If one desires to estimate variance components by the analysis of variance 
method, that is, by equating mean squares to their corresponding expected 
values, one obtains the following system of equations: 


MS4 = 62 + cap6§ + CaaS 5 
MSg = 62 + cypo5 + Coad, (4.10.12) 


a2 a2 a2 
MSg = 672 + CeoO'g + CeaT gy: 


where parameters have been replaced by their respective estimators. The resul- 
tant solution of the system of equations (4.10.12) provides a set of estimators 
of the variance components. The evaluation of explicit expressions for the esti- 
mators is somewhat involved and the results can be found in Searle (1971b, 
p. 487) and Searle et al. (1992, p. 439). The estimators obtained will be un- 
biased and consistent, but other optimum properties are still being explored. 
Although sampling variances can be obtained, other distribution properties can- 
not, since even under the normality assumptions, distribution of the estimators 
is unknown. 

The only functions of variance components for which exact intervals can be 
obtained are o? and o7,/0;. For a discussion of the problem of setting confi- 
dence intervals for the individual variance components, certain ratios of variance 
components and proportions of variability, including numerical examples, see 
Burdick and Graybill (1992, pp. 136-145). 


MIXED EFFECTS ANALYSIS 


Most of the inferential difficulties that are encountered occur in the mixed effects 
model. The treatment of the unbalanced mixed model is beyond the scope of 
this volume. The interested reader is referred to Searle (1971b, pp. 429-431; 
1987, Chapter 13; 1988), Stroup (1989), McLean et al. (1991), Hocking (1993), 
and Khuri et al. (1998). Smith (1951) discusses the tests of hypotheses for the 
mixed model with proportional frequencies. For a discussion of exact tests for 
the random and fixed effects in an unbalanced two-way crossed classification 
model, see Gallo and Khuri (1990). Burdick and Graybill (1992, p. 172) givea 
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numerical example illustrating the computation of an exact interval for O58 /o2 
and an approximate interval for Op. 


4.11 POWER OF THE ANALYSIS OF VARIANCE F TESTS 


The power of the analysis of variance F tests for AB interactions, factor B 
effects, and factor A effects can be evaluated in a manner similar to the case of 
one-way classification. The results on power calculations are briefly summa- 
rized in the following. 


MODEL I (FIXED EFFECTS) 


The parameter ¢ and the appropriate degrees of freedom for each of the tests 
are as follows. 


Test for AB Interactions 


Power = P{F’[v, 236] > Flv, 231 —a)}, 


where 
vy} =(a—1)\(b-—1), w=ab(n —-1), 
and 
ab 
|” > dab); 


oe\\ (a — 1(b— 1) +1 
Test for Factor B Effects 
Power = P{F'[v,, 236] > Flv, 231 —a)}, 
where 
vy, =b-1, w=ab(n-1), 


and 


Test for Factor A Effects 
Power = P{F'[v), 236] > Flv, v2;1— a}, 
where 


vy} =a-1, w=ab(n-1), 
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and 


Remark: Kastenbaum et al. (1970b) gave tables showing how large n must be (1 < 
n < 5) fora = 2(1)6and b = 2(1)5 in testing for factor A effects witha = 0.05, 0.01, 
and 1 — B = 0.7, 0.8, 0.9, 0.95, 0.99, 0.995, when max |a; — a;|/o, is given. More 
extensive tables are given by Bowman (1972) and Bowman and Kastenbaum (1975). 


MODEL II (RANDOM EFFECTS) 


The power calculations involve only the central F distribution. The results for 
each of the tests are as follows. 


Test forAB Interactions 


Power = P{F[v, v2] > A? F[vy, v2; 1 — a}, 


where 
vy) =(a—1)(6-1), w=ab(n—- 1), 
and 
no 
A=,fJ1+—$ 
oO 


Test for Factor B Effects 
Power = P{F[vj, v2] > 17? Flv, v2;1—a}}, 
where 
vy, =b-1, w=(a—-1\(b-1), 


and 


Test for Factor A Effects 
Power = P{F[v1, v2] > ,-? Flv}, %;1—a]}, 
where 


y=a-1l, w=(a—1)6—)), 
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r= he bno? 
7 o2+ NO sg 


The power of the tests of the hypotheses of the type oj,/0; < pi,0g/(a; + 


2 2 2 
now) < P2; or On /(o; 


central F distribution. 


and 


+ NOjn) < 3 can similarly be expressed in terms of the 


MODEL III (MIXED EFFECTS) 


The power of the tests for AB interactions and B effects involves central F 
distributions and for A effects involves the noncentral F distribution. The results 
are as follows. 


Test for AB Interactions 


Power = P{F[v, 2] > A? Fly, v2; 1 —a@)}, 


where 
vy) =(a—1)(b—-1), Ww =ab(tn— 1), 
and 
2 
no 
A=,f/1+— 
Oo 


Test for Factor B Effects 


Power = P{F[vj, v2] > n-? F[v,, v2.31 —a}}, 


where 
vy, =b-1, w=ab(in-1), 
and 
2 
ano 
A=,f/1+— 
Oo 


Test for Factor A Effects 
Power = P{F'[v, v2.36] > F[4, v2; 1 —e@]}, 
where 


y=a-l, w=(a—-I1)b- 1), 
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and 


The power of the tests of the hypotheses of the type oj,/07 < p: oroz/o; < po 
can similarly be expressed in terms of the central F distribution. 


4.12 MULTIPLE COMPARISON METHODS 


Usually, more than one comparison is of interest and the multiple comparison 
procedures discussed in Section 2.19 can be employed with only minor modifi- 
cations. The procedures can be utilized for the fixed as well as the mixed effects 
models. Most comparisons concern the control of the error rate a for each sep- 
arate family of F tests, that is, the two main effects and the cell means. One 
may want to control a@ for the entire experiment comprising all three families 
of tests but it is rarely of interest. 

For example, under Model I, if HH? is rejected, we would be interested in 
comparing the cell means y;; = “ + a; + Bj + (@B);;. Then, the Tukey’s or 
Scheffé’s method may be used to investigate the contrasts of the type 


L= wij — Mij', 


among all cell means, where L is estimated by 


L = yij. — yiry... 
Now, the procedure is equivalent to the one-way classification model with the 
total number of treatments here being equal tor = ab and the degrees of freedom 
for MS¢g equal to ab(n — 1). 
. Thus, suppose that y;,;, is larger than yj; ;. Then using the Tukey’s procedure 
L is significantly different from zero with confidence coefficient 1 — @ if 


A 


L 
V¥MSze/n 


If the Scheffé’s method is applied to these comparisons, then L is significantly 
different from zero with confidence coefficient 1 — a if 


> glab, ab(n — 1);1 — a]. 


A 


L 
J (ab — 1)MSz(2/n) 


If H;' is not rejected, we usually would proceed to test Hj! and Hj. If H¢' or 
H@ is rejected, the Tukey’s or Scheffé’s method may be used to study contrasts 


> {Flab — 1, ab(n — 1);1 — a}}!”. 
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among the a;’s or B;’s of the form 
or 


where L and L’ are estimated unbiasedly by 


a 
L= \~ Li Yj... 
i=l 


and 


—— > gla, ab(n — 1); 1 — @]. 
1 Qa 
(bn)! mse; d «i 


Similarly, if the Scheffé’s method is used, L is significant at the w-level if 


t > {Fla — 1, ab(n — 1);1 —a@)}'””. 


(a — 1)(bn)“! Mse( a) 
[=] 


I 


Likewise, the significance of the contrast L’ can be tested. 
If one wishes to construct intervals for L, then using the Tukey’s method a 
100(1 — @) percent simultaneous confidence interval for L is given by 


a 


. 1 . 1 
iL —T | (bn) iwse(5 Soil) <e<L-47 (bn) 'wse(5 9-14). 


. 


i=] 


i=l 


(4.12.1) 
where 


T = qla, ab(n — 1); 1 — a]. 


232 The Analysis of Variance 
In particular, for a pairwise contrast, the Tukey’s interval is 


yi. — Ww. — TV (bn)! MSe < a — a < J. — Yv.. + TV (bn)! MSe. 


Using the Scheffé’s method, the interval will be 


| 


<L<L+S |(a—1)(bn)"! Mse( 


ya 


i=] 


L—S |(a—1)(bn)! Mse( 


ye 


i=] 


) (4.12.2) 


where 
S* = F[a — 1, ab(n — 1);1 — a]. 


The Bonferroni-type confidence interval based on the ¢ distribution is obtained 
as 


L —t[lab(n — 1), 1 —a@/2m] 


(bn)-! MSz EA 
i=] 


<L <L+t{ab(n—1),1—a/2m) | (bn) MSe > €;, 


i=] 


where m is the number of intervals made, with an overall level of at least 1 — a. 
Similar confidence intervals can be given for L’. 

When a design is slightly unbalanced and one uses the unweighted-means 
analysis, then the foregoing Tukey’s procedure can be used by replacing n by 
Np given by (4.10.5). The coverage probability should be approximately 1 — a; 
however, as the design becomes more imbalanced, the coverage probability 
deteriorates. 

Under Model III, the contrasts of interest involve only «;’s and if Hg* is 
rejected, the Tukey’s or Scheffé’s method can be employed to investigate con- 
trasts of the type )>;_, £;a;. For example, suppose we wish to obtain all pairwise 
comparisons between a;’s by means of Tukey’s method. Then 


L = aj — qj, 


Two-Way Crossed Classification with Interaction 233 


and 
Var(£) = 2MS,ag/bn. 
The value of T in this case will be 
T = qla, (a — 1)(6 — 1);1— ae], 
leading to the interval 


Vi. — Wr.. — TV (bn)! MSap < oj — ay < ¥j,. — ¥ir. + TV (bn)! MSazs. 


(4.12.3) 
For the Scheffé’s interval, we will have 
L — S$ |(a—1)(bn)-! MS ap (>> a) 
i=] 
<L<L+S |(a—1)(bn)-!MSap (> a) (4.12.4) 
i=] 


where 
S? = Fla —1,(a—1)(b — 1);1 — a]. 


For just a single confidence interval, one can use J2t[(a — 1)\(b — 1);1 —a@/2] 
in place of T. For a limited number of comparisons k, the Bonferroni intervals 
are obtained by using J2t[(a — 1)(b — 1); 1 — a@/2k] instead of T. Fork < 
a(a — 1)/2, the Bonferroni intervals are usually shorter than the Tukey intervals. 


4.13 WORKED EXAMPLE FOR MODEL | 


Steel and Torrie (1980, pp. 217-218) reported data (courtesy of A.C. Linnerud, 
North Carolina State University) on times (in seconds) to complete a 1.5-mile 
course. All the runners were men classified in three age groups and in three 
fitness categories. The data form a two-way classification and are shown in 
Table 4.9. The example just described can be regarded as a two-way fixed 
effects model since three age groups and 3 fitness categories are specially 
chosen by the researcher to be of particular interest and thus both factors will 
have systematic effects. Since there are two observations for each combination 
level, this will enable the experimenter to evaluate for the presence of interaction 
effects. If there were just one observation for each combination level, either lack 
of interaction would have to be assumed or its presence would be confounded 
with the error term. It could not be estimated separately. 
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TABLE 4.9 
Running Time (in seconds) to 
Complete a 1.5 Mile Course 


Fitness Category 
Age Group Low Medium —_ High 


40 669 602 527 
671 603 547 

50 775 684 571 
821 687 573 

60 1,009 824 688 
1,060 828 713 


Source: Steel and Torrie (1980, p. 218). Used 
with permission. 


The mathematical model for this experiment would be 


l 
Vijk = M+ + Bi + (AB); +e I= 
k 


2,3 
2,3 
2 


>] 


“se 


where yp is the general mean, a; is the effect of the i-th age group (-7 a; = 0), 

6; is the effect of the j-th fitness category (5-1 B; = 0), (@B);; is the 

fixed effect interaction of the i-th age group with the j-th fitness category 

(S_, (wB);; =O0= Viet (wB);;), and e;;,’s are experimental errors assumed to 

be independently and normally distributed each with mean zero and variance a2. 
The following computations will lead to the analysis of variance table. 


(i) The cell totals: 


yu, = 1,340, yyo, = 1,205, yy3. = 1,074; 
y21, = 1,596, yoo = 1,371, yo3, = 1,144; 
y31, = 2,069, y32, = 1,652, 33. = 1,401. 


(11) The row (age) totals: 

yy). = 3,619, yo. =4,111, ys, = 5,122. 
(iii) The column (fitness) totals: 

yi. =5,005, yo, = 4,228, y3, = 3,619. 
(iv) The grand total: 


y,. = 3,619 + 4,111 +5,122 = 12,852. 
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(v) 
3 2 
So y24 = (669 + (671)? +--+» + (713)? = 9,557,568. 
i=1 j=l k=1 
(vi) 
2 12,852)" 
3x 3 x 2 13 
(vil) 
I s > _ 3,619)? + 4111? +5122 _ 9 aa ge 
3x2 6 = 9, , ; 
(viii) 
l 3 > _ (5,005)? + (4,228 + (3,619 a4 106 
3 x 2? jal yj. ~~ 6 —_ 9 9 . 
(ix) 


i 1,340)? + (1,205)? +... + (1,401)? 
> > y= aaa — 9,554,680. 


(x) SS- = 9,557,568 — 9,176,328 = 381,240. 
(xi) SSrc = 9,554,680 — 9,176,328 = 378,352. 
(xil) SS, = 9,372,061 — 9,176,328 = 195,733. 
(xii) SSg = 9,337,195 — 9,176,328 = 160,867. 
(xiv) SSag = 378,352 — 195,733 — 160,867 = 21,752. 
(xv) SSe = 381,240 — 378,352 = 2,888. 


These results, along with the remaining analysis of variance computations, 
are summarized in Table 4.10. If we choose the level of significance a = 0.05, 
we find from Appendix Table V that 


F[2, 9;0.95] = 4.26, 
and 


F[4, 9; 0.95] = 3.63. 


Comparing these values with the computed F values given in Table 4.10, we 
may reach the following conclusions: 


(a) Reject the hypothesis of no interaction effects and conclude that there 
is strong evidence of interaction between the different age groups and 
the different fitness categories (p < 0.001). 
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TABLE 4.10 

Analysis of Variance for the Running Time Data of Table 4.9 

Source of Degreesof Sum of Mean Expected 

Variation Freedom Squares Square Mean Square FValue p-Value 

3x22 4 

Age group 2 195,733 97,866.500 a2 + a ya; 304.99 <0.001 
| 

Fitness 3x22 

category 2 160,867  80,433.500  o2 + = \> 67 =. 250.66 = <0.001 
=14& 
2 
Interaction 4 21,752 5,438.000 a? 


“" G—DG-D 
3. 3 
x >> (aep)?, 16.95  <0.001 


i=l j=l 
Error 9 2,888 320.889 oa? 


Total 17 381,240 22,425.882 


(b) Reject the hypothesis of no age effects and conclude that different 
age groups result in different mean running time to complete the race 
(p < 0.001). 

(c) Reject the hypothesis of no fitness effects and conclude that the mean 
running times to complete the race are not the same for the three cate- 
gories (p < 0.001). 


It should be noted that the presence of interaction between age group and 
fitness category seems to be more than just achance occurrence. The presence of 
interactions makes the interpretation of the main effects more difficult. Although 
F tests still remain valid, the hypotheses about the main effects cannot be 
interpreted only in terms of the @;’s and the B;’s. Nevertheless, assuming that the 
interaction effects are unimportant, we attempt to illustrate the use of orthogonal 
contrasts to partition the sums of squares for age group and fitness category and 
perform tests on a contrast. 

If the hypothesis of no interaction were true, we could make general compar- 
isons regarding the fitness rather than separate comparisons for each age group. 
Similarly, we might make general comparisons among the age groups rather 
than separate comparisons for each fitness. For example, we could compare 
fitness category low versus high and also low and high versus medium. The 
contrasts for making these comparisons would be 


L, = B, — B 


and 


Lz = Bi + B3 — 2p, 
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respectively. The single degree of freedom sums of squares associated with L 
and L» are obtained as follows 


(5,005 — 3,619)" 
SS,, = ————_ = 160,083 
"612 + (-1)2] 


and 


2 
ss, — D002 + 3,019 = 24,228)" _ ogy 
: 6[(1)? + (1)? + (-2)7] 

Notice thatSS;, +SS 7, = SSz since L, and L2 are two independent orthog- 
onal contrasts which partition the sum of squares for the fitness into two single 
degree of freedom sums of squares. The computed F values corresponding to 
L, and Lz are, respectively, 


160,083 
320.889 


= 498.87 


] = 
and 


7840 | 
320.889 


fx, = 


Comparing these values to the critical value F[1, 9;0.95] = 5.12, we find that 
F, is highly significant (p < 0.001) but F> falls well below the significance 
level (p = 0.15). Thus, the results of the F tests indicate that the hypothesis 
Ho: B; — £3 = O is rejected whereas the hypothesis Ho: B; + Bs — 2A is 
sustained. 

Similarly, we could compare age groups 1 and 3 and also age groups 1 
and 3 versus 2. The resulting F ratios, each with 1 and 9 degrees of freedom 
are both highly significant (p < 0.001). The results indicate that the low age 
group requires the least running time whereas the upper age group requires the 
most running time. However, as indicated previously, the researcher should be 
cautious in making any general conclusions because of the strong interaction 
effects between the factors. The running time within each age group varies 
greatly according to the fitness category. Although, in each age group, the 
running time decreases dramatically as we move from the low to the high 
fitness category, the decrease is much greater for the upper age group than for 
the low and the middle age group. 


4.14 WORKED EXAMPLE FOR MODEL I: UNEQUAL SAMPLE 
SIZES PER CELL 


The following example is based on an unbalanced design described in Blackwell 
et al. (1991, pp. 286-287). The original data came from a balanced factorial 
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TABLE 4.11 
Weight Gains (in grams) of Rats under Different Diets (Data Made 
Unbalanced by Deleting Observations) 


Quantity of Protein 


Sourceof 
Protein High Low 

Beef 81, 100, 102, 104, 107, 111,117,118 51, 64, 72, 76, 78, 86, 95 

Pork 79, 91, 94, 96, 98, 102, 102, 108 49, 70, 73, 81, 82, 82, 86, 97, 106 


Cereal 56, 74, 77, 82, 86, 88, 92, 95,98, 111 58, 67, 74, 74, 80, 89, 95, 97, 98, 107 


Source: Blackwell et al. (1991, p. 287). Used with permission. 


experiment reported by Snedecor and Cochran (1989, p. 304) to test the effec- 
tiveness of two factors, source of protein: three levels — beef, pork and cereal, 
and quantity of protein: two levels — high and low, forming six potential protein 
feeding treatments. Ten male rats were randomly assigned to each treatment and 
gains in weight were recorded. The experimental data were made unbalanced 
by deleting eight observations and the remaining observations are given in 
Table 4.11. 

In the following, we illustrate both unweighted- and weighted-squares of 
means analysis. In this type of analysis, two models are used. The model for 
the mean of each subclass 1s 


_ _ l 
Xij = Vij. = M+ a; + Bj + (WB); + ij. | j 


where yj is the general mean, a; is the effect of the i-th source of protein 
(0, a; = 0), B; is the effect of the j-th quantity of protein (S51 6B; = 9), 
(wB);; is the fixed effect interaction of the 7-th protein source with the j-th 
protein quantity (3-2, (@B)ij =0= y-1(@B)ij), and é;;, = Yt €ijx/Nijk 
is the experimental error associated with the (i, j)-th cell which 1s assumed to 
be independently and normally distributed with mean zero and variance o?. 
The preceding equation represents the model being assumed when the sums of 
squares for factors A and B and the interaction AB are computed. For computing 
the error sum of squares, the model being used is 


Vijk = W+a; + Bj + (AB); + efx 


: 3 2 3 2 
where, again, 1) % = i=l 6; = 2 i=1(08)ij = yj =1(@B)ij = 0 and Cijk 
is the experimental error assumed to be independently and normally distributed 
with mean zero and variance o2. The expectation of error mean square is o2/np, 
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which has to be divided by 1/n, so that it yields an unbiased estimate of 


2 
O;. 


The computations proceed as follows: 


(i) The cell counts (n;;): 


nyj=8, nyy=7, 
ny=8, ny =9Y, 
n3,) = 10, n32 = 10. 


(ii) The cell means (x;;): 


X1j1 = 105.0000, X12 = 74.5714, 
X21, = 96.2500, X22 = 80.6667, 
x3; = 85.9000, =—_-x32 = 83.9000. 


(iii) The row (protein source) means: 
X;, = 89.7857, Xo, = 88.4584, x3, = 84.9000. 
(iv) The column (protein quantity) means: 
X; =95.7167, X2 = 79.7127. 


(v) The grand mean: 


x. = 87.7147. 
(vi) 
Y > x7, = (105.0000) + (74.5714)? + --- 
i=l j=l 

) + (83.9000)? = 46,775.0927. 
(v1) 

3 x 2x? = 6(87.7147)? = 46,163.2116. 
(vill) 


3 
2) © x? = 2[(89.7857)° + (88.4584)" + (84.9000)”] = 46,188.7409. 
i=1 
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N 


3) © x7, = 3[(95.7167)? + (79.7127)"] = 46,547.4036. 


Now, the unweighted sums of squares for factor A (protein source) and factor 
B (protein quantity) are: 


3 
Suu = 296-1) =2) > x — 3 x 2%? 
i=] 


= 46,188.7409 — 46,163.2116 = 25.5293 
and 


2 2 
SSau = 3) (8j — 8.) = 3) #3 -3 x 257 
j=] j=1 


j 
= 46,547.4036 — 46, 163.2116 = 384.1920. 


To calculate the corresponding weighted sums of squares, we have 


b2 (2)° 
Ni = Ny2 8 
b2 (2)? 
n\ ny 8 9 
b2 (2)° 
bas TTT 1s 20.0000, 
nN3\ N39 10 10 
= a” __ 8 95 7443 
a a a as 
ny nr n3) 8 8 10 
2 5s 960 
WB. = FT l ~~ 1 1 1 7e% 
— + — + -+-+-— 
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3 
y W AiXi. 
_ i=l 


XA = — 
> WAi 
i=l 
_ (14.9333)(89.7857) + (16.9412)(88.4584) + (20.0000)(84.90000) 
7 14.9333 + 16.9412 + 20.0000 
— 87.4686, 
and 
3 
> WBjX.j 
- 4 (25.7143)(95.7167) + (25.4260)(79.7127) 
XB = -—-—-: > s ——————_—_———_—_———_”X”X”mnkXn—_ ss cl 


25.7143 + 25.4260 


Therefore, the corresponding weighted sums of squares are: 


3 
SSaw = > wai(X;, — X4)" 
i=l 


= 14,.9333(89.7857 — 87.4686)" + 16.9412(88.4584 — 87.4686)" 
+ 20.0000(84.9000 — 87.4686)” 


= 228.7277 
and 


2 
SSpw = \- W pj (X.; — Xp) 
j=l 
= 25.7143(95.7167 — 87.7598)? + 25.4260(79.7127 — 87.7598) 
= 3274.5118. 


The interaction and error sum of squares for both unweighted and weighted 
analyses are: 
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and 
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= 46,775.0927 — 46, 188.7409 — 46,547.4036 + 46, 163.2116 
= 202.1598 


>> Oi Vij)? 


_ 413.088 — 403,983.0043 
= 9,104.9957. 


Finally, the total sum of squares is obtained as 


3 2 Ni; 
SSr = ¥- >) >On —y ) 

i=1 j=l k=1 
3 2 Ni; 

=D | ijk — ve 
i=1 j=l k=1 

= 413, — (4,556)? /52 

= 13,912.3077. 


These results along with the remaining analysis of variance computations 
are summarized in Table 4.12. Note that for both weighted and unweighted 
analyses, the sums of squares do not add up to the total sum of squares. If we 
choose the level of significance w = 0.05, we find from Appendix Table V that 


and 


F[2, 46, 0.95] = 3.20 


F[1, 46, 0.95] = 4.05. 


Comparing these values with the computed F values given in Table 4.12 for both 
unweighted and weighted analyses, we may reach the following conclusions 


(a) Reject the hypothesis of no interaction effects and conclude that there 


is some evidence of interaction between protein source and protein 
quantity. 


(b) Do not reject the hypothesis of no protein source effects and conclude 


that there is a lack of evidence that mean weights (population marginal 
means) for three sources of protein do not differ significantly. 
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(c) Reject the hypothesis of no protein quantity effects and conclude that 
there is strong evidence that mean weights for two quantities (marginal 
means for high and low levels) of protein differ significantly. 


It should be noted that the presence of interaction between protein source and 
quantity seems to be more than just a chance occurrence. The presence of 
interactions makes the statements about main effects somewhat difficult to 
interpret. Although the F tests are still valid, the hypotheses about the main 
effects cannot be interpreted only in terms of the a@;’s and the B;’s. 


4.15 WORKED EXAMPLE FOR MODEL Il 


Burdick and Graybill (1992, pp. 11-12) described a quality control experiment 
designed to study the sources of variability in the length of window screens. It 
is desired to determine the contribution of the variability in the final product 
that is due to operators, machines, and the operator x machine interaction. 
Three operators and four machines are randomly selected from the available 
operators and machines in the company and each operator makes two screens 
on each of the selected machines. The data collected in the experiment are 
given in Table 4.13. This is an example of a two-way crossed classification 
with replication. Here, our two factors are operators and machines and the 
experimental units that provide the replication are the machine-operator duos. 
Inasmuch as both factors are randomly selected from a rather large population, 
the data should be analyzed using Model II. Furthermore, since there are two 
observations for each combination of operator and machine, this will enable 
the experimenter to test for the presence of any interaction. 
The mathematical model for this experiment would be 


i=1,2,3 
Vijzk = Uta; + Bj + (@B)ij + ee ¥J=1,2,3,4 
k=1,2, 


where jz is the general mean, a; is the effect of the i-th operator, 6; 1s the 
effect of the j-th machine, (w);; is the interaction of the i-th operator with 
the j-th machine, and e ;;,’s are experimental errors. It is further assumed that 
a; ~ N(0,02), Bj ~ NCO, OR)» (@B);; ~ N(O, oJ); and that the ;’s, B;’s, 
(aB);;’s, and e;;,’s are mutually and completely independent. 

The following computations lead to the analysis of variance table. 


(i) The cell totals: 


V1. = 71.5, y12, = 72.8, Y13, = 70.3, Vi4 = 72.0, 
y21. = 71.5, 22. = 71.5, 23, = 72.9, v4, = 69.8, 
y31, = 70.7, yao, = 72.1, y33, = 72.0,  y34, = 72.4. 


Two-Way Crossed Classification with Interaction 245 


TABLE 4.13 
Screen Lengths (in inches) from 
a Quality Control Experiment 


Machine 
Operator 1 2 3 4 
1 36.3 36.7 35.1 35.2 
35.2 36.1 35.2 36.8 
2 35.2 35.33 368 34.9 
36.3 362 36.1 34.9 
3 35.8 36.0 35.9 36.3 


349 36.1 36.1 36.1 


Source: Burdick and Graybill (1992, p. 118). 
Used with permission. 


(ii) The row totals: 
yy... = 286.6, yo. = 285.7, y3.. = 287.2. 
(11) The column totals: 
ya, =213.7, yo =216.4, y3 =215.2, y4 = 2142. 


(iv) The grand total: 


y. = 859.5. 
(v) 
3 4 2 
SLY v2, = G63)? + (35.2)? +--+ 36.1)? = 30,789.67. 
i=1 j=1 k=1 
(vi) 
2 2 
y (859.5) 
= = = 30,780.8438. 
3x4x2 74 0, 780.8438 
(vil) 


1 3 » _ (286.6) + (285.7)? + (287.2) 


y? ; = 30,780.9863. 
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TABLE 4.14 
Analysis of Variance for the Screen Lengths Data of Table 4.13 
Source of Degreesof Sumof Mean Expected 
Variation Freedom Squares Square Mean Square FValue p-Value 
Operator 2 0.1425 0.071 o27+ 2008 +4x202 0.101 0.905 
Machine 3 0.7112 0.237) of + 2008 +3 x 20% 0.339 0.798 
Interaction 6 4.1975 0.700 of + 207, 2.222 = 0.113 
Error 12 3.7750 0.315 o; 
Total 23 8.8262 

(vill) 


1 y ye = (213.7)? + (216.4)? + --- + (214.2) 
7 a 


= 30,781.5550. 
3 x 2 4 6 
(ix) 
Ia, (71.5)? + (72.8 +--+ +(72.4/ 


(x) SSr = 30,789.67 — 30,780.8438 = 8.8262. 

(xi) SSrc = 30,785.8950 — 30,780.8438 = 5.0512. 
(xii) SS, = 30,780.9863 — 30,780.8438 = 0.1425. 
(xiii) SSg = 30,781.5550 — 30,780.8438 = 0.7112. 
(xiv) SSag = 5.0512 — 0.1425 — 0.7112 = 4.1975. 
(xv) SSp = 8.8262 — 5.0512 = 3.7750. 


These results along with the remaining computations are summarized in Table 
4.14. 

We can test the hypotheses of interest using the results shown in Table 4.14. 
The presence of the interaction is tested by comparing the ratio 2.222 with 
the theoretical F distribution with (6, 12) degrees of freedom which is not 
significant (p = 0.113). Hence, there is no evidence of the existence of any 
interaction effects. The existence of a main effect due to operators is tested 
by comparing the ratio 0.101 with the theoretical F distribution with (2, 6) 
degrees of freedom which is also not significant (p = 0.905). Similarly, the 
other main effect due to machines is tested by comparing the ratio 0.339 with 
the F distribution with (3, 6) degrees of freedom and this again 1s not significant 
(p = 0.798). Thus, we may conclude that there are no significant differences 
between the operators as well as between the machines, and also there is no 
evidence of any interaction between the two factors. 
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Furthermore, to assess the relative contribution of the variance components, 
we may obtain their estimates using formulae (4.7.20) through (4.7.23). Thus, 
we find that 


67 = 0.315, 
1 
oop = 5 (0.700 — 0.315) = 0.193, 


I 
64 = <(0.237 — 0.700) = —0.077, 


and 


1 
62 = ¢ (0.071 — 0.700) = —0.079. 


The negative estimates are an indication that the corresponding variance com- 
ponents may be zero. The results are consistent with the tests of hypotheses 
performed earlier. It is further evident that the larger part of the variability 
arises in the replication of measurements. 


4.16 WORKED EXAMPLE FOR MODEL III 


The following example is taken from an experiment described in Youden (1951, 
pp. 64-65). An experiment was performed to determine the effect of time aging 
on the strength of cement. Three mixes of cement were prepared and six spec- 
imens were made from each mix. Three specimens from each mix were tested 
after two days and later after seven days. The test specimens were two-inch 
cubes that yielded under the given load and were measured in units of 10 
pounds. The data are presented in Table 4.15. 

The experiment just described constitutes a mixed effects model. The mixes 
are random components, a sample of three drawn from a large number of mixes. 
The results of the experiment should be valid for the entire distribution of mixes. 
On the other hand, effects of aging are fixed effects. The conclusions of the ex- 
periment will reveal whether the yield loads differ after two or seven days, these 
periods being fixed. Hence, the data of Table 4.15 should be analyzed using 
Model III. Since there are three observations for each mix and aging combi- 
nation, this will enable the experimenter to test for the presence of interaction. 
Interaction terms cannot be ignored since it is quite possible that the three mixes 
differ after a long period of time without differing after a short period. In other 
words, the effect of an additional period of time is different for three mixes; 
that is, interaction is present. 
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TABLE 4.15 
Yield Loads for Cement Specimens 
Mix 

Aging Mix 1 Mix 2 Mix 3 
2-Day 574 524 576 
Test 564 573 540 
550 551 592 
7-Day 1,092 1,028 1,066 
Test 1,086 1,073 1,045 
1,065 998 1,055 


Source: Youden (1951, p. 65). Used with permission. 


The mathematical model for this experiment would be 


i-1,2 
Vijk =M+Q;,+ 8) + (AB); +e4ije YI=1,2,3 
k =1, 2, 3, 


where jz is general mean, a; is the effect of the i-th “aging” (S-_, a; = 0); B; is 
the effect of the j-th “mix” and is arandom variable assumed to be normally dis- 
tributed with mean zero and variance 0 33 (a@B);; is the interaction of the i-th “ag- 
ing” with the j-th mix and is a random variable assumed to be normally distri- 
buted with mean zero and variance at 20 (wB)i; =O, forj = 1, 2, 3); and 
€;;« 8 are experimental errors assumed to be independently and normally dis- 
tributed with mean zero and variance o. 


The following computations lead to the analysis of variance table. 


(1) The cell totals: 


yi. = 1,688, yi2, = 1,648, yy3, = 1,708, 
y21, = 3,243, yo = 3,099, y23, = 3,166. 


(11) The row totals: 
yy.. = 5,044, yo, = 9,508. 
(iu) The column totals: 


yi. =4,931, yo =4,747, ys, = 4,874. 
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(iv) The grand total: 


y,.. = 5,044 + 9,508 = 14,552. 


(v) 


3 
> > > Vik = (574)? + (564)7 + --- +(1,055)* = 12,882,026. 


i=1 j=l k=l 
(v1) 
? 14,552)° 
— ONO _ 11 764,483.5556. 
2x3x3 18 
(vil) 
3 2 _ 044)" + 0508)" _ 19 971.555.5556 
3x3 4 ee 
(viii) 


= 11,767,441.0000. 


_ (4,931) + (4,747) + (4,874) 
a ae 


Le) 
x — 
oS) 
[Me 
N< 
wow, N 


1,688 2 1,648 24... 3,166) 
_ (1,688)" + 1,648)" +--+ GB, 166)" 12,875,639.3333. 


1 2 3 ; 
3 Dy 3 


(x) SSr = 12,882,026 — 11,764,483.5556 = 1,117,542.4444. 
(x1) SSrce = 12,875,639.3333 — 11,764,483.5556 = 1,111,155.7777. 
(xn) SS, = 12,871,555.5556 — 11,764,483.5556 = 1,107,072.0000. 
(xi) SSg = 11,767,441.0000 — 11,764,483.5556 = 2,957.4444. 
(xiv) SSag = 1,111,155.7777 — 1,107,072.0000 — 2,957.4444 = 1,126.3333. 
(xv) SSe = 1,117,542.4444 — 1,111,155.7777 = 6,386.6667. 


These results along with the remaining analysis of variance computations 
are summarized in Table 4.16. Note that the numerical values of F tests are 
calculated differently here than in the case of Model I or Model II. The test 
for interaction is the same, but the test for random effects involves MSz/MSe 
and the test for fixed effects involves MS,/MS,gz. If we choose the level of 
significance a = 0.05, we find from Appendix Table V that 


F[1, 2;0.95] = 18.51 
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TABLE 4.16 
Analysis of Variance for the Yield Loads Data of Table 4.15 
Source of Degreesof Sum of Mean Expected 
Variation Freedom Squares Square Mean Square F Value p-Value 
Aging 1 1,107,072.0000 1,107,072.000 a? + 302, 

3x34 , 965 ; 

+5 2% 1,965.80 <0.00 

Mix 2 2,957.4444 —-:1,478.722 02 +2 x 303 2.78 0.102 
Interaction 2 1,126.3333 563.167 of + 30%, 1.06 0.377 
Error 12 6,386.6667 532.222 oa? 
Total 17 1,117,542.444 
and 


F[2, 12;0.95] = 3.89. 


Comparing these values with the computed F values given in Table 4.16, we 
may reach the following conclusions: 


(a) Do not reject the hypothesis of no interaction effects and conclude that 
the data do not give sufficient evidence of the existence of interaction 
between the “aging” and the “mixes” (p = 0.377). 

(b) Do not reject the hypothesis of no “mixes” effects and conclude that the 
mean strength of cement does not vary in the population of mixes. 

(c) Reject the hypothesis of no “aging” effects and conclude that there is a 
significant effect from the additional five days of aging. 


Furthermore, we can make a comparison of two- and seven-day aging effects 
by using Tukey’s and Scheffé’s methods of simultaneous confidence interval. 
For the Tukey’s procedure, we find from Appendix Table X that 


gla, (a — 1)(b — 1); 1 — a] = g[2, 2;0.95] = 6.09. 


So that 


MS 563.167 
gla, (a — 1)(b — 1);1 — a], — = 6.09,/ = 48.17 
bn 3x3 


and, from (4.12.13), a95 percent simultaneous confidence interval for a, — a2 
is given as 


(1,056.44 — 560.44) — 48.17 < a2 — a, < (1,056.44 — 560.44) + 48.17 
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or 
447.83 < a2 —a, < 544.17. 
For the Scheffé’s procedure, we find from Appendix Table V that 
S? = Fla — 1,(a—1)(b — 1);1 —@] = F[1, 2;0.95] = 18.51, 


so that 


S?(a — 1)(bn)~! MSag ) > €? = V'18.51(1)G x 3)-! (563.167)(2) = 48.13 


i=l 


and from (4.12.4) a 95 percent simultaneous confidence interval for a2 — a, is 
given as 


(1,056.44 — 560.44) — 48.13 < a2 — a, < (1,056.44 — 560.44) + 48.13 
or 
447.87 < a. —a, < 544.13. 


Notice that witha = 2, there is only one contrast and both Tukey’s and Scheffé’s 
procedures are equivalent to the usual tf test. 

Finally, suppose it is desired to determine the power of the test when the 
difference in time effect is as large as 400 psi. Since the test specimens are in 
units of 10 lbs, 400 psi corresponds to 40 units. Furthermore, since a, + a2 = 0, 
this gives a; = —20.0 and a2 = 20.0. Now, from Section 4.11, the normalized 
noncentrality parameter is 


a 
bn ) a? 
i=l 


a(o? + no) 
[3 x 3{(—20.0) + (20.0)?} 
2(563.167) 


= 2.53, 


where an estimate of a? + nog is obtained from MS, = 563.167. Since the 
Pearson-Hartley charts do not contain a power curve for v; = 1 and v2 = 2, 
we calculate the power using the noncentral ¢ distribution. The noncentrality 
parameter (5) for the noncentral ft distribution is determined as 6 = /ad = 
/2(2.53) = 3.58, Now, entering the Appendix Chart I witha = 0.05, df = 2, 
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and 6 = 3.58, the power is found to be about 0.48. The use of Appendix Tables VI 
and VII with appropriate interpolation gives essentially the same result. Notice 
that a very small number of degrees of freedom for the ¢ test makes it quite 
Insensitive. 


4.17 USE OF STATISTICAL COMPUTING PACKAGES 


As in Section 3.16, for a two-way fixed effects analysis of variance with an equal 
number of observations and no missing values, the recommended procedure is 
SAS ANOVA. If nj;;’s are unequal because of only few missing values, they 
could be replaced by their respective cell means and the data could be analyzed 
as in the preceding. However, if there is a wide disparity between nj;’s, the 
GLM procedure should be used. The GLM produces Type I, Type II, Type II, 
and Type IV sums of squares. When some cells are empty, caution should be 
used in choosing an appropriate sum of squares. For a random or mixed model 
analysis, GLM with RANDOM or TEST option should be used. For equal n;;’s 
in each cell, estimates of variance components can be readily obtained from the 
entries of the analysis of variance table. For unequal n;;’s, PROC MIXED or 
VARCOMP must be used for estimating variance components. For the details 
of SAS commands, see Section 11.1. 

Among SPSS procedures, the ANOVA would be a better choice for fixed 
effects analysis involving a balanced layout. For the design involving an un- 
equal number of observations per cell and arandom and mixed model analysis, 
GLM or MANOVA must be used. For the estimation of variance components, 
VARCOMP (available in Release 7.0 and 8.0) is the procedure of choice. For 
instructions regarding SPSS commands, see Section 11.2. 

In using BMDP programs, as indicated in Section 3.15, the two programs 
suited for this model are 7D and 2V if the analysis involves only fixed effects 
in the model. However, when the number of observations 1n each cell is rather 
large, 7D would be a better choice since it could provide comparative histograms 
and descriptive statistics for data in each cell. Similar to GLM, 2V is a general 
purpose program for performing fixed effects analysis of variance for both 
balanced and unbalanced data sets. For the analysis involving random and mixed 
effects models, 3V and 8V can be used. For designs with equal n;;’s in each cell, 
8V is recommended since it is simpler to use. For unequal n;;’s, 3V would be the 
preferred choice. This program also provides estimates of variance components 
using maximum likelihood and restricted maximum likelihood procedures. 

Utmost care should be exercised in using packaged programs for unequal 
sample sizes and when some cells are empty. It is important to find out how the 
individual program or procedure handles the empty cells and the assumptions it 
makes about the interaction terms. The user should make sure that the program 
outputs the appropriate sums of squares for the tests of hypotheses of interest. 
For some further discussion and details in this regard, see Milliken and Johnson 
(1992, Chapter 14). 
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4.18 WORKED EXAMPLES USING STATISTICAL PACKAGES 


In this section, we illustrate the applications of statistical packages to perform 
two-way analysis of variance with interaction for the data sets employed in 
examples presented in Sections 4.13 through 4.16. Figures 4.1 through 4.4 
illustrate the program instructions and the output results for analyzing data 
given in Tables 4.9, 4.11, 4.13, and 4.15, using SAS ANOVA/GLM, SPSS 
MANOVA/GLM, and BMDP 7D/2V/8V procedures. The typical output pro- 
vides the data format listed at the top, cell means, and the entries of the analysis 
of variance table. It should be noticed that in each case, the results are the same 
as those provided using manual computations in Sections 4.13 through 4.16. 
However, note that certain tests of significance in a mixed model may differ 
from one program to the other since they make different model assumptions. 


4.19 THE MEANING AND INTERPRETATION 
OF INTERACTION 


In the discussion of the two-way model (4.1.1), we have assumed the existence 
of interaction to take into account the fact that the two factors may not be 
independent; that is, the effects of one factor may vary with the levels of the 
other factor. Thus, for example, suppose that the yield of a chemical process 
depends on two factors: the concentration of the chemical and the operating 
temperature. Now, if the yield at different concentration levels varies with the 
level of the operating temperature, we would say that interaction is present. 
The lack or presence of interaction is marked by parallelism or nonparallelism . 
in the plots of average treatment responses. For example, consider two levels 
for each factor A and B, denoted by (A), Az) and (B;, Bz), respectively. Some 
possible patterns for observed cell means and presence or lack of interactions 
are illustrated in Figure 4.5. The graphical illustrations allow a visual inspection 
of factor effects and their interactions. Any nonparallel change in the average 
response is an indication of the presence of an interaction. 

The existence or nonexistence of interaction effects, as inferred from the 
F test of interaction, can have very important bearing on how one interprets 
and uses the results of an experiment. When two factors A and B interact, an 
important question arises as to whether the main effects of A and B are mean- 
ingful measures to interpret. Thus, if the hypothesis of interaction is rejected, 
we may conclude that the effects of A and B are not additive; that is, factors 
A and B interact. If this happens, testing the significance of A and B factor 
effects becomes meaningless under the present formulation of the model. Note 
that accepting the hypothesis about the A factor effects means that there are 
no differences in the various levels of A when averaged over the levels of B. 
However, in the presence of interaction this interpretation is meaningless. The 
presence of interaction means that the effect of one factor 1s dependent on the 
levels of the other. Similarly, rejecting the hypothesis about the A factor effects 
when interaction is present is also meaningless. The same argument holds true 
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DATA FITNESS; The SAS System 

INPUT AGE FITNESS RUNNING; Analysis of Variance Procedure 

DATALINES; 

11 669 Dependent Variable: RUNNING 

11 671 Sum of Mean 

12 602 Source DF Squares Square E Value Pr > F 


3 3 713 Model 8 378352.00 47294.00 147.38 0.0001 
; Error 9 2888.00 320.89 

PROC ANOVA; Corrected 17 381240.00 

CLASSES AGE FITNESS; Total 

MODEL RUNNING=AGE FITNESS R-Square C.V. Root MSE RUNNING Mean 
AGE* FITNESS; 0.992425 2.508876 17.913 714.00 
RUN; 

CLASS LEVELS VALUES Source DF Anova SS Mean Square F Value Pr > F 
AGE 3 12 3 

FITNESS 3 12 3 AGE 2 195733.00 97866.50 304.99 0.0001 
NUMBER OF OBS. IN DATA FITNESS 2 160867.00 80433.50 250.66 0.0001 
SET=18 AGE*FITNESS 4 21752.00 5438.00 16.95 0.0003 


(i) SAS application: SAS ANOVA instructions and output for the two-way fixed effects analysis 


of variance with two observations per cell. 


| DATA LIST Analysis of Variance-Design 1 
/AGE 1 FITNESS 3 
RUNNING 5-8. Tests of Significance for RUNNING using UNIQUE sums of squares 
BEGIN DATA. 
1 1 669 Source of Variation ss DF MS FE Sig of F 
11 671 
ee WITHIN+RESIDUAL 2888. 9 320.89 
3 3 713 AGE 195733. 2 97866.50 304.99 .-000 
END DATA. FITNESS 160867. 2 80433.50 250.66 .000 
MANOVA RUNNING BY AGE BY FITNESS 21752. 4 5438.00 16.95 -000 
1AGE(1,3) FITNESS 
(1,3) (Model) 378352. 47294.00 147.38 .000 
| /DESIGN=AGE (Total) 381240. 22425.88 


992 Adjusted R-Squared = .986 


(ii) SPSS application: SPSS MANOVA instructions and output for the two-way fixed effects 


analysis of variance with two observations per cell. 


/INPUT FILE='C: \SAHAI BMDP7D - ONE- AND TWO-WAY ANALYSIS OF VARIANCE WITH 
\TEXTO\EJE8.TXT'. DATA SCREENING Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 

VARIABLES=3. | ANALYSIS OF VARIANCE 

/VARIABLE NAMES=AGE,FIT,RUN. | | SOURCE SUM OF SQUARES DF MEAN SQUARE F VALUE PROB. 

/GROUP VARIABLE=AGE, FIT. 
CODES (AGE) =1, 2,3. 195733.0000 97866.5000 
NAMES (AGE) =A40,A50 | | FITNESS 160867 .0000 80433.5000 

,A60. {INTERACTION 21752.0000 5438 .0000 
CODES (FIT)=1,2,3. | ERROR 2888.0000 320.8889 
NAMES (FIT) =L,M,H. ; 

/HISTOGRAM GROUPING=AGE, FIT. ANALYSIS OF VARIANCE; 
VARIABLE=RUNNING. VARIANCES ARE NOT ASSUMED TO BE EQUAL 

/END | WELCH 4 1031.90 

1 1 669 | BROWN-FORSYTHE 

11 671 | AGE 

oe | FITNESS 

3 3 713 | INTERACTION 


(iii) BMDP application: BMDP 7D instructions and output for the two-way fixed effects analysis 


of variance with two observations per cell. 


FIGURE 4.1 Program Instructions and Output for the Two-Way Fixed Effects 
Analysis of Variance with Two Observations per Cell: Data on Running Time (in 
seconds) to Complete a 1.5 Mile Course (Table 4.9). 
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1 DATA RATWEIGT; The SAS System 
S INPUT SOURCE QUANTITY GAINS; General Linear Models Procedure 
h DATALINES; Dependent Variable: GAINS 

11 

Sum of Mean 

Source DF Squares Square F Value Pr >F 
Model 5 4807.2934 961.4587 4.86 0.0012 
Error 46 9105.0143 197.9351 
oe Corrected 51 13912.3077 
3 2 Total 
PROC GLM; R-Square c.V. Root MSE GAINS Mean 
CLASSES SOURCE QUANTITY; 0.345542 16.05761 14.069 87.615 
MODEL GAINS=SOURCE QUANTITY Source DF Type I SS Mean Square F Value Pr > F 
SOURCE* QUANTITY; SOURCE 302.1077 151.0538 0.76 0.4720 
RUN; QUANTITY 2771.9325 2771.9325 14.00 0.0005 
CLASS LEVELS VALUES SOURCE* QUANTITY 2 1733.2532 866.6266 4.38 0.0182 
SOURCE 3 12 3 Source F Type III SS Mean Square F Value Pr > F 
QUANTITY 2 12 SOURCE 2 228.7266 114.3633 0.58 0.5652 
1 
2 


NUMBER OF OBS. IN DATA QUANTITY 3274.4985 3274.4985 16.54 0.0002 
SET=52 SOURCE* QUANTITY 1733.2532 866.6266 4.38 0.0182 


(i) SAS application: SAS GLM instructions and output for the two-way fixed effects analysis of 


11 
11 
11 
11 


variance with unequal numbers of observations per cell. 


DATA LIST Analysis of Variance-Design 1 
/SOURCE 1 QUANTITY 3 
GAINS 5-7. Tests of Significance for GAINS using UNIQUE sums of squares 


Source of Variation ss DF MS F Sig of F 


WITHIN+RESIDUAL 9105. 46 197.94 

SOURCE 228. 2 114.36 . 965 
QUANTITY 3274. 1 3274.50 -000 
SOURCE BY QUANTITY 1733. 2 866.63 -018 


MANOVA GAINS BY (Model) 4807. 5 961.46 . 001 
SOURCE (1,3) QUANTITY (1,2) (Total) 13912. 51 272.79 

/DESIGN=SOURCE QUANTITY 

SOURCE BY QUANTITY. R-Squared =. Adjusted R-Squared = .274 


(ii) SPSS application: SPSS MANOVA instructions and output for the two-way fixed effects 
analysis of variance with unequal numbers of observations per cell. 


FILE='C: \SAHAI BMDP2V - ANALYSIS OF VARIANCE AND COVARIANCE WITH 

\TEXTO\EJE9.TXT'. REPEATED MEASURES. Release: 7.0 (BMDP/DYNAMIC) 

FORMAT=FREE . 

VARIABLES=3. ANALYSIS OF VARIANCE FOR THE 1-ST DEPENDENT VARIABLE 
/VARIABLE NAMES=SOURCE, 

QUANTITY, GAINS 
VARIABLE=S,Q. THE TRIALS ARE REPRESENTED BY THE VARIABLES:GAINS 
CODES (SOURCE) =1, 2,3. 


NAMES (SOURCE) =B, P,C. SOURCE SUM OF D.F. MEAN 
CODES (QUANTITY)=1,2. SQUARES SQUARE 
NAMES (QUANTITY) =H, L. 
/DESIGN DEPENDENT=GAINS. MEAN 393454.04800 
/END SOURCE 228,.72657 


1 393454.04800 1987.79 
2 114.36328 0.58 
11 81 QUANTITY 3274.49851 1 3274.49851 16.54 
soe SQ 1733.25321 2 866.62660 4.38 
3 2 107 ERROR 9105.01429 46 197.93509 


(iii) BMDP application: BMDP 2V instructions and output for the two-way fixed effects analysis 
of variance with unequal numbers of observations per cell. 


FIGURE 4.2 Program Instructions and Output for the Two-Way Fixed Effects 
Analysis of Variance with Unequal Numbers of Observations per Cell: Data on 
Weight Gains of Rats under Different Diets (Table 4.11). 
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1#DATA SCREENLT; The SAS System 

| INPUT OPERATOR MACHINE General Linear Models Procedure 

LENGTHS; Dependent Variable: LENGTHS 

DATALINES; Sum of Mean 
1 36.3 Source DF Squares Square F Value Pr > F 
1 35.2 Model 11 5.0512500 0.4592045 1.46 0.2626 
2 36.7 Error 12 3.7750000 0.3145833 
2 36.1 Corrected 23 8.8262500 


eee Total 
4 36.1 R-Square c.V. Root MSE LENGTHS Mean 
0.572299 1.566149 0.5609 — 35.813 
PROC GLM; Source DF Type III SS Mean Square F Value Pr > F 
| CLASSES OPERATOR MACHINE; | OPERATOR 2 0.1425000 0.0712500 0.23 0.8007 


| MODEL _LENGTHS=OPERATOR MACHINE 3 0.7112500 0.2370833 0.75 0.5411 

| MACHINE. OPERATOR*MACHINE; | OPERATOR*MACHINE 6 4.1975000 0.6995833 2.22 0.1125 

RANDOM OPERATOR MACHINE Source Type III Expected Mean Square 

OPERATOR* MACHINE; OPERATOR Var(Error) + 2 Var(OPERATOR*MACHINE) 

TEST H=OPERATOR + 8 Var (OPERATOR) 

1 E=OPERATOR*MACHINE; MACHINE Var(Error) + 2 Var (OPERATOR*MACHINE) 

| TEST H=MACHINE + 6 Var (MACHINE) 

E=OPERATOR*MACHINE; OPERATOR*MACHINE Var (Error) + 2 Var (OPERATOR*MACHINE) 

RUN; Tests of Hypotheses using the Type III MS for 

| CLASS LEVELS VALUES OPERATOR*MACHINE as an error term 

1 OPERATOR 3 12 3 Source DF Type III SS Mean Square F Value Pr > F 

| MACHINE 4 1 2 3 4 | OPERATOR 2 0.1425000 0.0712500 0.10 0.9047 

1 NUMBER OF OBS. IN DATA Source DF Type III SS Mean Square F Value Pr > F 
MACHINE 3 0.7112500 0.23708333 0.34 0.7985 


(i) SAS application: SAS GLM instructions and output for the two-way random effects analysis 


of variance with two observations per cell. 


Tests of Between-Subjects Effects Dependent Variable: LENGTHS 
/OPERATOR 1 
MACHINE 3 Source Type III SS df Mean Square F Sig. 
LENGTHS 5-8 (1) OPERATOR Hypothesis -142 2 7.125E-02 - 102 905 
BEGIN DATA. Error 4.197 6 - 700 (a) 
36.3 MACHINE Hypothesis -711 3 .237 339 -798 
35.2 Error 4.197 6 - 700 (a) 
36.7 OPERATOR* Hypothesis 4.197 6 .700 2.224 ~112 
36.1 MACHINE Error 3.775 1 -315 (b) 
35.1 a MS(OPERATOR*MACHINE) b MS (Error) 
35.2 
35.2 Expected Mean Squares (a,b) 
. . Variance Component 
1 3 36.1 Source Var (0) Var (M) Var (O*M) Var (Error) 
SEND DATA. OPERATOR 8.000 -000 2.000 1.000 
GLM LENGTHS BY MACHINE -000 6.000 2.000 1.000 
OPERATOR MACHINE OPERATOR* MACHINE .000 .-000 2.000 1.000 
/DESIGN OPERATOR Error . 000 .000 .000 1.000 
MACHINE a For each source, the expected mean square equals the sum of the 
OPERATOR* MACHINE coefficients in the cells times the variance components, plus a 
/RANDOM OPERATOR quadratic term involving effects in the Quadratic Term cell. 
MACHINE. b Expected Mean Squares are based on the Type III Sums of Squares. 


(ii) SPSS application: SPSS GLM instructions and output for the two-way random effects 


analysis of variance with two observations per cell. 


FIGURE 4.3. Program Instructions and Output for the Two-Way Random Effects 
Analysis of Variance with Two Observations per Cell: Data on Screen Lengths 
from a Quality Control Experiment (Table 4.13). 


about the effects of factor B. Thus, when stating the effects of one factor it is 
necessary to specify the level of the other. This is the most important meaning 
of the interaction, namely, when interactions are present, the factors themselves 
cannot be evaluated individually. The presence of interactions requires that the 
factors be evaluated jointly rather than individually. 
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” FILE=C: \SAHAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\TEXTO\EJE10.TXT’. - EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 

VARIABLES=2. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 
|) /VARIABLE NAMES=L1,L2. SOURCE ERROR SUM OF D.F. MEAN 
/ DESIGN NAMES=O0,M,L. TERM SQUARES SQUARE 

LEVELS=3,4, 2. MEAN -0780843E+4 30780.84348 

RANDOM=O,M, L. OPERATOR OM 0.07125 0.10 0.9047 
0.23708 0.34 0.7985 
0.69958 2.22 0.1125. 
0.31458 


-1424993E+0 


-1974911E+0 
- 7749950E+0 


3 1 
0 2 
MODEL='0,M,L(OM)'. 0.7112480E+0 3 
4 6 
3 1 


2 


SOURCE EXPECTED MEAN ESTIMATES OF 
SQUARE VARIANCE COMPONENTS 
ANALYSIS OF VARIANCE DESIGN MEAN 24 (1) +8 (2) +6(3)+2 (4)+(5) 1282.55145 
INDEX Oo -M L OPERATOR 8(2)+2 (4)+(5) -0.07854 
NUMBER OF LEVELS 3 4 2 6(3)+2 (4) +(5) -0.07708 
POPULATION SIZE INF INF INF 2 (4) +(5) 0.19250 
O, M, L(OM) 0.31458 


(iii) BMDP application: BMDP 8V instructions and output for the two-way random effects 


analysis of variance with two observations per cell. 


FIGURE 4.3 (continued) 


Significant interactions serve as a warning: treatment differences possibly 
do exist, but to specify exactly how the treatments differ, one must look within 
the levels of the other factor. The presence of the interaction effects is a signal 
that in any predictive use of the results, effects ascribed to a particular treatment 
representing one factor are best qualified by specifying the level of the other 
factor. This is especially important if one is going to try to use estimated effects 
in forecasting the result of a treatment to an experimental unit. If interaction 
effects are present, the best forecast can be made only if the particular levels of 
both factors are known. 

When the observations suggest the presence of significant interactions, it 
is important to determine whether large interactions really do exist or whether 
there may be some other reasons for the presence of the interactions. Often large 
interactions may exist as a result of the dependent variable being measured on 
an inappropriate scale, and the use of a simple transformation may remove 
most of the interaction effects. Some simple transformations that are helpful 
in reducing the importance of interactions include the logarithmic, reciprocal, 
square, and square-root transformations (see Section 2.22 for a discussion of 
these transformations). | 

Sometimes, the investigator may think that there are no interactions; however, 
the data obtained may indicate a considerable amount of interactions. This 
could possibly happen purely by chance variation. On the other hand, such 
unexpectedly large interactions may simply occur due to the presence of outliers 
(observations much different from the rest of the data). The entire interaction 
may depend upon just one observation that may be wrong or an outlier. One 
should look at the data more carefully for the presence of an outlier before 
discarding them. If after further examination, the data look normal, there is 
the possibility of some complicated and unsuspected phenomenon that may 
require investigation. If the observations were not made using some random 
device, considerable time effects may be embedded in the data obtained. Thus, 
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DATA YIELDLOD; The SAS System 
INPUT AGING MIX YIELD; General Linear Models Procedure 
DATALINES; Dependent Variable: YIELD 
574 Sum of Mean 
564 Source DF Squares Square F Value 
550 Model 5 1111155.7778 222231.1556 417.55 0.0001 
524 Error 12 6386.6667 §32.2222 
573 Corrected 17 1117542.4444 


3 1055 R-Square c.V. Root MSE YIELD Mean 
0.994285 2.8536212 23.069942 808.44444444 


PROC GLM; DF Type III ss Mean Square F Value Pr > F 
CLASSES AGING MIX; 1 1107072.000 1107072.000 2080.09 0.0001 
MODEL YIELD=AGING MIX 2 2957.444 1478.722 2.78 0.1020 
AGING*MIX; 2 1126.333 563.167 1.06 0.3774 
RANDOM MIX AGING*MIX; Type III Expected Mean Square 
TEST H=AGING E=AGING*MIX; Var(Error) + 3 Var(AGING*MIX) + Q(AGING) 
Var (Error) + 3 Var(AGING*MIX) + 6 Var (MIX) 
* LEVELS VALUES AGING*MIX Var(Error) + 3 Var (AGING*MIX) 

2 Tests of Hypotheses using the Type III MS for AGING*MIX as 

3 an error term 
NUMBER OF OBS. IN DATA Source DF Type III SS Mean Square F Value Pr > F 
SET=18 1 1107072.000 1107072.000 1965.80 0.0005 


(1) SAS application: SAS GLM instructions and output for the two-way mixed effects analysis 
of variance with three observations per cell. 


| 


Tests of Between-Subjects Effects Dependent Variable: YIELD 


Type III SS df Mean Square F 
BEGIN DATA.., Hypothesis 1107072 .000 1 1107072.000 1965.798 
574 Error 1126.333 2 . 167 (a) 
564 Hypothesis 2957.444 2 -722 2.626 
550 Error 1126.333 2 .167 (a) 
524 AGING*MIX Hypothesis 1126.333 2 167 1.058 
573 Error 6386.667 12 222 (b) 
a MS(AGING*MIX) b MS (Error) 
Expected Mean Squares (a,b) 
Variance Component 
Var (MIX) Var (AGING*MIX) Var (Error) Quadratic Term 
.000 3.000 1.000 
6.000 3.000 1.000 
AGING*MIXxX -000 3.000 1.000 
Error -000 -000 1.000 
a For each source, the expected mean square equals the sum of the coeff- 
icients in the cells times the variance components, plus a quadratic term 
! MIX AGING*MIX involving effects in the Quadratic Term cell. b Expected Mean Squares are 
| /RANDOM MIX. based on the Type III Sums of Squares. 


© WWWNODD Bee 


(ii) SPSS application: SPSS GLM instructions and output for the two-way mixed effects analysis 
of variance with three observations per cell. 


FIGURE 4.4 Program Instructions and Output for the Two-Way Mixed Effects 
Analysis of Variance with Three Observations per Cell: Data on Yield Loads for 
Cement Specimens (Table 4.15). 


the error terms can no longer be assumed to be uncorrelated. In some cases, 
an uncontrolled variable may affect the results of the observations showing the 
presence of interactions. For example, in a laboratory experiment involving 
mice, the location of the cage may have an effect on the outcome and if this 
factor was left uncontrolled or the mice were not randomly assigned, we might 
observe an apparent interaction when there was none. 

It has also been found that interactions frequently occur when the main effects 
are large. Interactions usually become less important by reducing the differences 
among the levels of treatment, and thus moderating the size of the main effects. 
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/ INPUT FILE='C: \SAHAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\TEXTO\EJE11.TXT’. - EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 
VARIABLES=3. 

/VARIABLE NAMES=Y1, Y2, Y3. SOURCE ERROR SUM OF D.F. MEAN 

/DESIGN NAMES=AGING, MIX, TERM SQUARES SQUARE 

YIELD. MEAN MIX 11764483. 11764483.6 7955.84 0.0001 
LEVELS=2, 3,3. AGING AM 1107072. 1107072.0 1965.80 0.0005 


FIXED=AGING. AM Y (AM) 1126. 563.2 1.06 0.3774 
MODEL='A,M, Y(AM)°. Y (AM) 6386. 2 532.2 
SOURCE EXPECTED MEAN ESTIMATES OF 
564 550 SQUARE VARIANCE COMPONENTS 

573 551 MEAN 18(1)+6(3)+(5) 653500.26852 
540 592 AGING 9(2)+3(4)+(5) 122945 .42593 
1086 1065 MIX 6(3)+(5) 157.75000 
1073 998 AM 3(4)+(5) 10.31481 
1045 1055 Y (AM) (5) 532.22222 


1 
1 
RANDOM=MIX. MIX Y (AM) 2957. 2 1478.7 2.78 0.1020 | 
2 
1 


(iii) BMDP application: BMDP 8V instructions and output for the two-way mixed effects 


analysis of variance with three observations per cell. 


FIGURE 4.4 (continued) 


For this reason, the presence of interaction effects can be most important to the 
interpretation of the experiment. Although it is necessary to consider possible 
interaction effects even in fairly simple experiments, the subject of interaction 
and of the interpretation that should be given to significant tests for interaction 
is neither simple nor fully explored. For a broad review of various aspects of 
interactions, see Cox (1984). 


4.20 INTERACTION WITH ONE OBSERVATION PER CELL 


In the discussion of model (3.1.1) in Chapter 3, we had assumed that there are 
no interaction terms. If the existence of interaction is assumed, model (3.1.1) 
becomes 


Vij = +O + Bj + (AB)ij + ij, (4.20.1) 


where [L, a@;’s, B;’s, and e;;’s are defined as in model (3.1.1), and (@);;’s are the 
interaction effects between factors A and B, which are assumed to be constant 
under Model I and are randomly distributed with mean zero and variance Cis 
under Models II and III. Proceeding as before, the pertinent analysis of variance 
can be derived and is shown in Table 4.17. 

On comparing Tables 3.2 and 4.17, it is seen that they are same except for the 
differences in the expressions of expected mean square column. In Table 4.17 
if we let (@B);; = 0 or Ocp = (0 and change the word interaction to error, we 
have exactly the same analysis of variance table as in Table 3.2. However, if the 
assumption of no interaction is not tenable, we have the following inferential 
problems. Under Model I, no direct tests are possible since the hypothesis that 
either of the effects a;’s and 6;’s or the interaction (@f),;’s are zero gives 
us no suitable mean squares to compare. When the interactions are present, 
SS, has anoncentral chi-square distribution and the F ratios MS, /MS,4 z and 
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FIGURE 4.5 Patterns of Observed Cell Means and Existence or Nonexistence 
of Interaction Effects: (a) No effect of factor A, large effect of factor B, and no 
AB interaction; (b) Large effect of factor A, moderate effect of factor B, and no 
AB interaction; (c) No effect of factor A, large effect of factor B, and large AB 
interaction; (d) No effect of factor A, no effect of factor B, but large AB interaction; 
(e) Large effect of factor A, no effect of factor B, with small AB interaction; (f) 
Large effect of factor A, small effect of factor B, with small AB interaction. (The 
graphs (a) to (f) are obtained by representing the levels of factor A as values on the 
x-axis and plotting the cell means at those levels as values on the y-axis. Separate 
curves are drawn for each level of factor B. Alternatively, one could represent the 
levels of B on the x-axis and separate curves drawn for each level of factor A.) 
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MSg/MSaz have doubly noncentral F distributions.* The usual F tests for 
main effects a;’s and B;’s may be inefficient if there is appreciable interaction 
so that }o7_, yj (@B)?, # 0, since the denominator mean square will be 
inflated by the extra component. On the other hand, if either variance ratio is 
significant, it may be taken that the corresponding effect is real. 

Even though there are no direct tests for interaction effects, Tukey (1949b) 
has devised a test that may be used for testing the existence of interaction terms. 
The null hypothesis and the alternate are:? 


Ao : (a@B)i; = 9, 1=1,2,...,a;j=1,2,...,b 
versus (4.20.2) 
A, : not all (@B);;’s are zero. 


The procedure requires the computation of the sum of squares for nonadditivity 
defined by 


ab 2 
bs » Vig Vi. — VIOG - 50] 


SSy = SS (4.20.3) 


a b 
YG. — 9. S065 - 9. 
i=1 j=l 


It can be shown that under Ho, the statistic 


Fe = S8N /__S8aa —SSw (4.20.4) 
1 (a—1)(b-—1)-1 
is distributed as F'[1, (a — 1)(b — 1) — 1] variable. Note that there is one degree 
of freedom associated with SSy and (a — 1)(b — 1) — 1 = ab — a — b degrees 
of freedom are associated with SS4z — SSy. Thus, a large value of F* leads to 
the rejection of Ho. 
For computational purposes, the SSy term can be further simplified by ex- 
panding the numerator in (4.20.3) into four terms and then rearranging them as 
follows 


a b a y? b y2 y? 2 
> 4 HIYLII ~ Y- La thay ab 


i=1 j= i=1 


oN = abSSaSSs) —SCS~s—‘CSSCi‘ 


The first term in the numerator of (4.20.5), that is, )°_, iat Vip Vij, can 


8 For a definition of the doubly noncentral F distribution, see Appendix H. 

9 Technically speaking the interaction hypothesis in (4.20.2) is incorrect. Tukey (1949b) consid- 
ered inreractions of the form (wB);; = Ga; B; with G fixed but unknown, leading to one-degree- 
of-freedom test of the interaction hypothesis Hp : G = 0. 
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be more easily calculated by rewriting it as }-;_, yi yijy.j]. The sec- 
ond term within parentheses in the numerator of (4.20.5) is equivalent to 
SS4+SSzg + y?/ab. 

The test is commonly known as Tukey’s one degree of freedom test for 
nonadditivity. It is discussed in detail by Scheffé (1959, pp. 129-134) and Rao 
(1973, pp. 249-255) and a numerical example appears in Ostle and Mensing 
(1975, Section 11.3). 


Example 1. We illustrate Tukey’s test for the data of Table 3.3. To cal- 
culate the test statistic (4.20.4), we obtain 


a b 
SY vi Gi. — F_IG5 — J. 


i=l j=l 
— 241(261.67 — 239.5)(226.25 — 239.5) 
4... 4+227(232.33 — 239.5)(237.25 — 239.5) 
— —2,332.1405, 


a SS, 3,141.667 
YG; - 9.7 = A = = = 1,047.2223, 
i=l 


b 

SS» 1,683.500 
S55 — FP = A = 2 = 420.8750. 
j=l 4 4 


(—2,332.1405)? 
Hence, from (4.20.3), we have SSy = ——— 2. = 12.340. 
ence, from (4.20.3), we have SSw =~ 01775993(420.87750) 


Finally, we obtain the test statistic (4.20.4) as 


= 0.05. 


_ Se Puen — 12.340 


1 6-1 


Assuming the level of significance at a = 0.05, we obtain F[1, 5; 0.95] = 
6.61. Since F* < 6.61, we may conclude that material and position do not 
interact (p = 0.832). The use of the no-interaction model for the data in 
Table 3.3, therefore, seems to be reasonable. 


Remarks: (i) The power function of Tukey’s test for nonadditivity has been studied by 
Ghosh and Sharma (1963), and Hegemann and Johnson (1976). Milliken and Graybill 
(1971) developed tests for interaction in the two-way model with missing data. In addi- 
tion, a variety of tests that are sensitive to particular non-additivity structures have also 
been proposed in the literature (see, e.g., Mandel (1971), Hirotsu (1983), Miyakawa 
(1993)). Krishnaiah and Yochmowitz (1980), Johnson and Graybill (1972a), and Bolk 


264 The Analysis of Variance 


(1993) provide reviews of additivity tests. For a generalization of Tukey’s test for a two- 
way classification to any general analysis of variance or experimental design model, see 
Milliken and Graybill (1970). 

(i1) If Tukey’s test shows the presence of interaction effects, some simple transforma- 
tions such as a square-root or logarithmic transformation may be employed to see if the 
interaction can be removed or made negligible. Johnson and Graybill (1972b) discuss 
an approximate method of analysis in the presence of interaction effects. 

(iii) Tukey’s test of nonadditivity can be performed using SAS GLM procedure by 
first fitting a two-way model with factors A and B as sources. The predicted values 
from the fitted model are squared and then the procedure is run again with the MODEL 
statement that includes A, B and square predictions as sources. The square predictions 
do not appear in the class statement and appear as the last term of the MODEL statement 
(as a covariate). The following SAS codes illustrate the procedure: 


data tukey; data ft; 

inputa b y; set f; 

datalines; p2 = pred * pred; 
a proc glm; 

proc glm; class a b; 

classa b; model y=a b p?; 
model y=a b; run; 


output out = ¢ t = pred; 


Under Model II, it is possible to test for the main effects (1. e., o2 = 0 
or oR = () by dividing their respective mean squares by the interaction mean 


square.!° However, we have lost our F test for the hypothesis O56 = 0, although 


a 

Tukey’s one degree of freedom test for nonadditivity described in the foregoing 
can also be used here. In regard to the point estimation of variance components, 
the estimators (3.7.17) and (3.7.18) are still unbiased for o2 and Of, but our 
estimate (3.7.16) of the error variance is biased since E(MSz) = o? + O5p. 
There is not much one can do about the MS, being a biased estimator of o2 
except to assume that Oop = 0; if the assumption were incorrect, one would be 
erring on the conservative side because MS¢ would tend to overestimate a2. 


Similar remarks apply for Model III. 


4.21 ALTERNATE MIXED MODELS 


As remarked in Section 4.2, several different types of mixed models have been 
proposed in the statistical literature. Among other models proposed are those 
by Tukey (1949a), Wilk and Kempthorne (1955, 1956), Scheffé (1956b), and 
Smith and Murray (1984).'! These models differ from the “standard” mixed 
model, discussed earlier in this chapter, in terms of the assumptions about the 
random effects B;’s and (@f);;’s. In this section, we briefly describe one of these 


10 For a discussion of the robustness of the tests for o2 and OF see Tan (1981). 
'1 Smith and Murray (1984) proposed a model that employs covariance components to allow 
negative correlations among observations within the same cell. 
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TABLE 4.18 
Analysis of Variance for an Alternate Mixed Model 
Source of Degrees of Sum of = Mean Expected 
Variation Freedom Squares Square Mean Square 

2 2 bn_ wo 
Due to A — atl SSA MSa Oo, +no yg + —— )_ ai 

wales 
Due to B b—1 SSB MSz of + NOs + ano; 
Interaction (a — 1)(b— 1) SSAB MSap Oo; +noZ, 
AxB 

Error ab(n — 1) SSr MSe a? 


alternate models. Suppose that the @;’s are fixed effects such that }°"_, a; = 0 
and £;’s are independently distributed normal random variables with mean 
zero and variance Op. The interaction effects (@B);;’s are also independently 


distributed normal random variables with mean zero and variance O53 and 
(a@B);;'s are independent of the B;’s. Note that the main difference between 
this mixed model and the “standard” mixed model discussed earlier is the 
assumption about the independence of the interaction effects.'* The analysis of 
variance and expected mean squares for this model are shown in Table 4.18. On 
comparing this table to Table 4.2, we note that the only noticeable difference is 
the inclusion of the variance component O5p in the expected mean square for the 
random effects which does not appear in the “standard” model. (In fact, there 
are some other minor differences due to different definitions of the variance of 
the interaction effects in the two models, but these do not affect the analysis.) 
Under this model, the hypothesis 


Hy Of =0 versus H,? Op > 0 
would be tested by the statistic 


nn . MSs 
°~ MSas 


in contrast with the statistic Fg = MSp/MSz, used in the “standard” model. 
The test is usually more conservative than the one based on the “standard” 
model, since MS, z will in general be larger than MS<. Again, the analysis of 
variance procedure may be used to estimate the variance components. From the 
mean square column of Table 4.18, we find that the only variance component 
which would have different estimates from the ones obtained in the “standard” 


12 The major criticism of the mixed model described here concerns the assumption of independence 
of the (af); ;’s since it is felt that these random terms within a given level of the factor B will 
often be correlated. 
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model is «7, which is now estimated by 


~2  MSp— MSap 
64 = —————. 
an 
The hypothesis concerning the fixed effects factor is 
Hj :a; =Oforalli versus Hy’: a; 4 0 for at least one i. 
An exact a-level test of H;! is to reject Hg if MS4/MS,z > Fla — 1, (a — 1) 
(b — 1); 1 — a]. Notice that this is the same test as the one under the ‘standard’ 


model discussed in Section 4.6. The fixed effects 4, 4 + ;’S, aj’S, &; — a; are, 
of course, estimated by 


and 
_ _ . -/ 
a —-a = V.—-y., tA, 


which are the same estimates as in the ‘“‘standard’’ model. However, the variances 
of the estimates will differ from those in the “‘standard”’ model. Thus, 


2 2 2 
0; + NO, + ano, 


Var(y_) = , 

(¥...) abn 

7 oa? + Node + nog 
Var(yj..) = — 

n 

_ oa? + Noss + noj 

Var(yij.) = —————— 
and 
2(o7 + noZ,) 

Var(yi.. — Vir.) = 2 


bn 


An unbiased estimate of Var(¥;..) can be obtained by using an appropriate linear 
combination of the mean squares. An approximate 1 — @ level confidence 
interval for 4 + a; can be constructed using 


a? + ni +né% 
y,.. £t[v, 1 —a/2] —_"—., 
n 
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where the degrees of freedom v will be estimated using the Satterthwaite pro- 
cedure. Similarly, an exact confidence interval for a; — a; is given by 


2MS 4B 
bn ’ 


and an exact confidence interval for a general contrast of the form )°"_, £;a; 
(jai €1 = 0) is Ly 2:57. tla — I(b — 1), 1 — @/2]./MS ap) <\_, bn. 
which 1s exactly the same as that given by (4.8.9). 

The “standard” model as well the new mixed model described here are spe- 
cial cases of the mixed model discussed by Scheffé (1956b; 1959, pp. 261— 
274). According to the Scheffé’s model, the observation Yijk 1S represented 
by 


Yi. — Yr. Eta — 1)(6 — 1), 1 — @/2] 


1=1,2,...,a 
Yijk = Mi tej YJ=1,2,...,b 
k=1,2, Nn, 


where m;,; and e;;, are mutually and completely independent random variables. 
Furthermore, mj; iS given by the linear structure: 


mij = h+ a; + Bj + (aB)i;, 


where 
E(mij) = "+a, 
with 
a ] a 
>> a; =0, B= = mij — H 
i=l i=] 
and 


> (@B)ij =0, jf =1,2,...,0. 
i=1 


Thus, for the Scheffé’s model the restrictions on (@B);;’s are the same as in 
the “standard” mixed model discussed in Section 4.2. The main difference be- 
tween them arises in specifying the variance-covariance structure of the random 
components 8; and (a@B)j;. 


Remark: Scheffé assumes that the vectors (B;, (@B)1;, (@B)2;,--., (@B)aj), j=1,...,b 
are independent multivariate normal vectors that satisfy the constraint }°;_,(a@B);; = 0 
for each j. This implies that (@B),;, (@B)2;,..., (@#B)a; are dependent on 6,. He further 
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defines a covariance structure for the {m;;} and it is possible to express the variances and 
covariances of the 6;’s and (@);;’s indirectly by stating the elements of this variance- 
covariance matrix. Thus, the two mixed models discussed in this chapter are rather 
special cases of Scheffé’s model. The analysis of the Scheffé’s model is similar to the 
“standard model” and the F tests for testing hypotheses H? :0; = O and H;¥ : oj, = 0 
are exactly the same as for the “standard” model. However, the distribution theory of 
MS, and MS,z is much more complicated; and, in general, the statistic MS,/MS az is 
not always distributed as an F variable when H, : a; = 0 is true. The only way to obtain 
an exact test for this problem is to consider it in a multivariate framework, which leads 
to Hotelling’s T* test (Scheffé, 1959, pp. 270-274). Scheffé avoids this procedure and 
instead suggests the use of the ratio MS4,/MSaz which can be approximated as an F 
variable with a — 1 and (a — 1)(b — 1) degrees of freedom. Another difference is that 
even though the hypothesis HA? : a2, = 0 may be tested by the statistic MS4g/MSz, 


a 


which under H4? has an F distribution with (a — 1)(b — 1) and ab(n — 1) degrees of 
freedom, the power is not expressible in terms of the central or noncentral F distribu- 
tion, since SS,z is not distributed as constant times a chi-square variable when HA? 
is false. The power of this test has been studied by Imhof (1958). For a discussion of 
multiple comparison methods for Scheffé’s mixed model, see Hochberg and Tamhane 
(1983). 


In view of numerous versions of mixed models, a natural question arises as 
to which model should be employed. Most people tend to favor the “standard 
model” and it is most often discussed in the literature. Furthermore, the results 
on expected mean squares under sampling from a finite population agree in form 
with those of the standard model (see Chapter [X). If the correlation values of 
the random components are not high, then either mixed model can be used 
and there are only minor differences between them. However, if the correlation 
values tend to be large, then Scheffé’s model should be preferred. The choice 
between different mixed models should always be guided by the correlation 
structure of the observed data and to what extent the correlations between the 
random components affect the characteristics of tests and estimation procedures 
of different mixed models. 


4.22 EFFECTS OF VIOLATIONS OF ASSUMPTIONS 
OF THE MODEL 


The list of assumptions for the model (4.1.1) is almost an exact parallel to the 
list of assumptions for models (2.1.1) and (3.1.1). Similar assumptions are made 
for more complex experiments entailing higher-order classifications. Thus, as 
one may anticipate, the same violations of assumptions are possible in the 
two- or multi-way crossed classifications as in the one-way classification. In 
this section, we briefly summarize some known results concerning the effects 
of violations of assumptions on the inference of the model (4.1.1). Further 
discussions on this topic can be found in Scheffé (1959, Chapter X) and Miller 
(1986, Chapter IV). 
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MODEL I (FIXED EFFECTS) 


For experiments involving balanced or nearly balanced designs, with relatively 
large numbers of observations per cell, the assumption of normality for error 
terms seems to be rather unimportant. However, for severely unbalanced ex- 
periments, the heavy-tailed or contaminated distributions may produce outliers 
and thereby distort the results on estimates and tests of significance. Thus, in 
an experiment, if the observations are suspected to depart from normality, then 
perhaps a balanced design with a correspondingly large number of observa- 
tions per cell should be used. Furthermore, if the data yield an equal number of 
observations in each cell, then the requirement of equal error variance in each 
cell, if violated, may not involve any serious risk. 

Krutchkoff (1989) carried out a simulation study to compare the performance 
of the usual F test along with a new procedure called the K test. The results 
indicated that the F test had larger type I error and decreased power. In designs 
involving unequal numbers of observations per cell, “... the size of the F test 
was inflated when the larger errors were on the cells with the smaller number 
of observations and deflated when the larger errors were on the cells with the 
larger number of observations.” In both situations, there was a decrease in 
power; but the drop was much more serious for the latter case. However, the K 
test was generally insensitive to the heterogeneity of variances. Consequently, 
there are two good reasons for planning an experiment with an equal number 
of observations per cell: the experimental design will be balanced leading to 
simple exact tests and the possible consequences of heterogeneous variances 
will be minimized. 

The assumption of independence seems to have major importance and its 
violation may lead to erroneous conclusions. (For a discussion of the prob- 
lem of serial correlation created by observations taken in time sequence, see 
Section 3.17.) For this reason great care should be taken in the planning and 
analysis of experiments involving repeated observations to ensure the indepen- 
dence of error terms. Thus, random assignment of experimental units to the 
treatment combinations 1s especially important. 


MODEL II (RANDOM EFFECTS) 


The lack of nonnormality in any of the random effects can seriously affect the 
distribution theory of the sum of squares involving them. The point estimates of 
variance components are still unbiased but the effects of nonnormality on tests 
and confidence intervals can lead to erroneous results. In particular, the tests 
and confidence intervals on o2 are very sensitive to nonnormality. However, 
the statistics MS,/MS,4g and MSg/MSazz, for testing the effects of variance 
components are somewhat robust. 

Very little is known about the effects of unequal variances and the lack of 


independence on the inferences for the two-way random model. 
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MODEL III (MIXED EFFECTS) 


There are very few studies dealing with the effects of violation of assumptions 
on the inferences for the two-way mixed model. For balanced or nearly balanced 
designs and moderate departures from normality, the effect of nonnormal ran- 
dom effects {e;;,}, (@B);i;, (Bj) on tests about the fixed effects {a;} is expected 
to be rather small. In the presence of appreciable nonnormality, however, the 
consequences could be more serious. In regard to tests and confidence intervals 
for the variance components, the effects of nonnormality may be extremely 
misleading. 

The problem of unequal variances could occur in terms of the variances of 
any of the random effects {e;;,}, {(@B);;}, and {B;}. For tests on {a;}, the effect 
of varying o? and O56 should be somewhat similar to the two-way fixed effects 
model (3.1.1) (with one observation per cell) since MS 4, is used in the denom- 
inator of the F test. The effect of varying a7, on testing the hypotheses concern- 
ing the variance components Ox, and Op» should be similar to the one-way model 
(2.1.1) since MS; is used in the denominators of the associated F statistics. 

Not much is known concerning the effect of lack of independence of the 
random effects {e;;,}, {(@B);;}, and {8;} on inferences for the two-way mixed 
model. 


EXERCISES 


1. An industrial engineer wishes to determine whether four different 
makes of automobiles would yield the same mileage. An experiment 
is designed wherein a random sample of three cars of each make is 
selected from each of three cities, and each car given a test run with 
one gallon of gasoline. The results on the number of miles traveled 
are given as follows. 


Make of Automobile 
City | il il IV 


Boston 244 23.6 27.1 22.6 
23.9 22.7 28.0 22.3 
25.55 22.9 274 23.6 
Los Angeles 25.7 24.2 25.1 24.5 
26.5 23.9 268 24.2 
25.4 246 248 25.3 
Dallas 23.9 23.7 27.3 24.4 
22.7 23.33 27.0 23.5 
25.1 24.8 266 24.1 


(a) Why was it considered necessary to include three cities in the 
experiment rather than just one city? 

(b) How would you obtain a random sample of three cars from a 
city? 
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(c) What assumptions are made about the populations, and what 
hypotheses can be tested? 

(d) Describe the model and the assumptions for the experiment. 

(e) Analyze the data and report the analysis of variance table. 

(f) Test whether there are differences in mileage among the cities. 
Use a = 0.05. 

(g) Test whether there are differences in mileage between makes of 
automobiles. Use a = 0.05. 

(h) Test whether there are interaction effects between cities and 
makes of automobiles. Use a = 0.05. 

2. An experiment is designed to compare the corrosion effect on three 
leading metal products. Eighteen samples, six of each metal, were 
used in the experiment and they were assigned at random into six 
groups of three each. The first three groups had densities (kg/mm?) 
taken after a test period of 30 hours and the next three groups were 
measured after a test period of 60 hours. The relevant data in certain 
standard units are given as follows. 


Metal Product 
Test Period (hrs) Steel Copper Zinc 


30 149 158 129 
126 129 124 
115 158 154 
60 152 112 126 
142 152 151 
124 117 138 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in corrosion effect among the 
three metal products. Use a = 0.05. 

(d) Test whether there are differences in corrosion effect between 
the two test periods. Use a = 0.05. 

(e) Test whether there are interaction effects between metal products 
and test periods. Use a = 0.05. 

(f) Determine a 95 percent confidence interval for a}. 

(g) Let a; be the effect of the i-th test period and 6; be the effect 
of the j-th metal product. Determine simultaneous confidence 
intervals for @; — @2 and B, — 3 using an overall confidence 
level of 0.95. 

3. A tool manufacturer wishes to study the effect of tool temperature and 
tool speed on a certain type of milling machine. An experiment was 
designed wherein two levels of tool temperature (300°F and 500°F) 
and four levels of tool speed (V;, V2, V3, and V4) were used, and three 
measurements were made for each combination of tool temperature 
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and tool speed. The relevant data on milling machine measurements 
in certain standard units are given as follows. 


Tool Tool Speed 
Temperature (°F) V; V> V3 V4 


300 4,783 5,720 5,185 5,530 
| 5,373 5,190 5,150 5,540 
5,383 5,523 5,397 5,155 

500 5,225. 5,837 5,131 5,493 
5,533. 5,180 5,290 5,341 

5,145 5,190 5,235 5,390 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in milling machine measure- 
ments among the four tool speeds. Use a = 0.05. 

(d) Test whether there are differences in milling machine measure- 
ments between the two tool temperatures. Use a = 0.05. 

(e) Test whether there are interaction effects between too] tempera- 
tures and tool speeds. Use a = 0.05. 

(f) Determine a 95 percent confidence interval for a2. 

(g) Let a; be the effect of the i-th tool temperature and 6; be the 
effect of the j-th tool speed. Determine simultaneous confidence 
intervals for @; — a2 and $B) — f3 using an overall confidence 
level of 0.95. 

4, Anexperiment was performed to determine the “active life” for three 
specimens of punching dies P;, P2, and P3 taken from seven punching 
machines, M,, M>,..., M7 inacertain factory. The relevant data on 
measurements in minutes are given as follows. 


Punching Machine 


Punching — 
Die M, M2 M3 Mg, Ms Me My; 

P, 35.8 37.7 368 369 39.3 37.4 41.3 
38.2 40.2 409 359 375 388 43.3 

P2 38.3 33.9 37.8 35.7 35.9 365 37.1 
36.1 37.2 385 33.1 37.3 383 364 

P3 38.7 39.7 38.9 373 369 38.1 40.0 


35.9 406 35.6 35.7 354 35.6 35.9 


(a) Describe the model and the assumption for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in “active life” among the 
seven punching machines. Use a = 0.05. 
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(d) Test whether there are differences in”’active life’ among the three 
punching dies. Use a = 0.05. 

(e) Test whether there are interaction effects between punching dies 
and punching machines. Use a = 0.05. 

(f) Determine a 95 percent confidence interval for 02. 

(g) Let a; be the effect of the i-th punching die and 8; be the effect of 
the j-th punching machine. Determine simultaneous confidence 
intervals for ~@, — @ and B, — f3 using an overall confidence 
level of 0.95. 

5. A production engineer wishes to study the effect of cutting tempera- 
ture and cutting pressure on the surface finish of the machined com- 
ponent. He designs an experiment wherein three levels of each factor 
are selected, and a factorial experiment with two replicates is run. 
The relevant data in certain standard units are given as 
follows. 


Temperature 


Pressure Low Medium High 


Low 51.5 51.8 51.3 
51.3 51.7 51.5 
Medium 51.2 51.6 49.9 
51.4 51.7 51.2 
High 51.6 51.9 51.5 
51.8 51.8 51.2 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in the surface finish among 
the three levels of temperature. Use a = 0.05. 

(d) Test whether there are differences in the surface finish between 
the three levels of pressure. Use a = 0.05. 

(e) Test whether there are interaction effects between cutting tem- 
peratures and cutting pressures. Use a = 0.05. 

(f) Determine a 95 percent confidence interval for 02. 

(g) Let a; be the effect of the i-th pressure and 6; be the effect of the 
j-th temperature. Determine simultaneous confidence intervals 
for a — a2 and 6; — f3 using an overall confidence level of 0.95. 

(h) Evaluate the power of the test for detecting a true difference in 
pressure such that a a? = 0.10, where a; is the i-th level 
pressure effect. 

6. The following table gives the partial results of the analysis of vari- 
ance computations performed on the data of the life of five brands 
of plastic products used under five different process temperatures. 
Three plastics of each brand were used for each process temperature. 
Complete the analysis of variance table and perform the relevant tests 
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of hypotheses of interest to the experimenter. Why was it thought nec- 
essary to include different process temperatures in the experiment? 
Explain. 


Source of Variation Sum of Squares 


Brand 311.23 
Temperature 321.34 
Interaction Lee 
Error 23.31 
Total 915.7 


7. Itis suspected that the strength of a tensile specimen Is affected by the 
strain rate and the temperature. A factorial experiment is designed 
wherein four temperatures are randomly selected for each of three 
strain rates. The relevant data in certain standard units are given as 
follows. 


Temperature (°F) 


Strain Rate 
(S*1) 100 200 300 400 
0.10 81 86 89 106 


91 75 95 111 
67 79 99 103 
0.20 109 105 106 111 
93 «111 115 107 
95 95 102 106 
0.30 106 =«111 115 111 
105 106—=«117 118 
109 102 #106 114 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in the tensile strength among 
the levels of temperature. Use a = 0.05. 

(d) Test whether there are differences in the tensile strength between 
strain rates. Use a = 0.05. 

(e) Test whether there are interaction effects between levels of tem- 
perature and strain rates. Use a = 0.05. 

(f) Determine point and interval estimates of the variance compo- 
nents of the model. 

(g) Determine a 95 percent confidence interval for the mean differ- 
ence in response for strain rates of 0.10 and 0.30. 

(h) Analyze the data using the alternate mixed model discussed in 
Section 4.21 and compare the results obtained from the two mod- 
els. 
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8. A production control engineer wishes to study the factors that influ- 
ence the breaking strength of metallic sheets. He designs an experi- 
ment wherein four machines and three robots are selected at random 
and a factorial experiment is performed using metallic sheets from 
the same production batch. The relevant data on breaking strength in 
certain standard units are given as follows. 


Machine 
Robot 1 2 3 4 


1 112 113 «6111 «113 
113, «118 = 112)—s IT 
2 113. 113=«6«114~—s‘*i118 
115 114) 112~—Ssd117 


3 119 115 «117 123 
117) «118 =) 122)—Ss«119 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in the breaking strength of the 
metallic sheets among machines. Use a = 0.05. 

(d) Test whether there are differences in the breaking strength of the 
metallic sheets between robots. Use a = 0.05. 

(e) Test whether there are interaction effects between machines and 
robots. Use a = 0.05. 

(f) Determine point and interval estimates of the variance compo- 
nents of the model. 

(g) Determine the power of the test for detecting a machine effect 
such that Of = o7, where op is the variance component for the 
machine factor and a? is the error variance component. 

(h) Suppose that the robots were selected at random, but only four 
machines were available for the test. Test for the main effects and 
interaction at the 5 percent level of significance. Does the new 
experimental situation affect either the analysis or the conclu- 
sions of your study? 

9. A production control engineer wishes to study the thrust force gen- 
erated by a lathe. He suspects that the cutting speed and the depth 
of cut of the material are the most important determining factors. 
He designs an experiment wherein four depths of cut are randomly 
selected and a high and low cutting speed chosen to represent the 
extreme operating conditions. The relevant data in certain standard 
units are given as follows. 
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Cutting Speed 


Low 


High 


The Analysis of Variance 


Depth of Cut 


0.01 


2.61 
2.69 
2.74 
2.77 


0.03 


2.36 
2.39 


2.77 
2.78 


0.05 


2.66 
2.77 


2.85 
2.79 


(a) Describe the model and the assumptions for the experiment. 
(b) Analyze the data and report the analysis of variance table. 
(c) Test whether there are differences in thrust force among depths 


of cut. Use a = 0.05. 


(d) Test whether there are differences in thrust force between cutting 


speeds. Use a = 0.05. 


(e) Test whether there are interaction effects between cutting speeds 
and depths of cut. Use a = 0.05. 
(f) Estimate the variance components of the model (point and inter- 


val estimates). 


A quality control engineer wishes to study the influence of furnace 
temperature and type of material on the quality of a cast product. 
An experiment was designed to include three levels of furnace tem- 
perature (1200°F, 1250°F, and 1300°F) for each of three types of 
material. The relevant data in certain standard units are given as 


follows. 


Material 1200°F 


1 79) 
779 
781 


2 761 
74] 
789 


3 757 
786 
799 


Temperature 


1250°F 


2191 
2198 
2196 


2181 
2145 
2111 


2156 
2164 
2177 


1300°F 


2493 
2491 
2497 


2439 
2423 
2399 


978 
1115 
999 


(a) State the model and the assumptions for the experiment. Assume 


that both factors are fixed. 


(b) Analyze the data and report the analysis of variance table. 
(c) Does the material type affect the response? Use a = 0.05. 
(d) Does the temperature affect the response? Use a = 0.05. 
(e) Is there a significant interaction effect? Use a = 0.05. 
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11. Crump (1946) reported the results of analysis of variance performed 
on the data from four successive genetic experiments on egg pro- 
duction with the same sample of 25 races of the common fruitfly 
(Drosophila melanogaster), 12 females being sampled from each 
race for each experiment. The observations were the total number of 
eggs produced by a female on the fourth day of laying. The mathe- 
matical model for this experiment would be 


1,2,3,4 
Yijk = A+; + Bj + (QB); + eije j=1,2,...,25 
k=1,2,..., 12, 


b 3 


where pz is the general mean, a; is the effect of the i-th experi- 
ment, 8; is the effect of the j-th race, (@B),; is the interaction of 
the i-th experiment with the j-th race, and the e ;;,’s are experimen- 
tal errors. It is further assumed that a; ~ N(0, 0), Bj ~ N(0, 03), 
(aB);; ~ N(O, O54); and that the @;’s, B;’s, (@B);;’s, and e;;,’s are 
mutually and completely independent. The analysis of variance com- 
putations of the data (not reported here) are carried out exactly as 
in Section 4.15 and the results on sums of squares are given as 
follows. 


Analysis of Variance for Genetic Experiments Data 


Source of Degreesof Sumof Mean Expected 
Variation Freedom Squares Square Mean Square F Value p-Value 


Experiment 139,977 
Race 77,832 
Interaction 33,048 
Error 254,100 
Total 504,957 


Source: Crump (1946). Used with permission. 


(a) Complete the remaining columns of the preceding analysis of 
variance table. 

(b) Test whether there are differences in egg production among dif- 
ferent experiments. Use a = 0.05. 

(c) Test whether there are differences in egg production among dif- 
ferent races. Use a = 0.05. 
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(d) Test whether there are interaction effects between experiments 
and races. Use a = 0.05. 

(e) Determine point and interval estimates for each of the variance 
components of the model. 

(f) Suppose that 25 races used in the experiment are of particular 
interest to the experimenter; and thus, this factor is considered 
to have a fixed effect. Perform tests of hypotheses and obtain 
estimates of the variance components under the assumptions of 
the mixed model. 

12. Box and Cox (1964) reported data from an experiment designed to 
investigate the effects of certain toxic agents. Groups of four animals 
were randomly allocated to three poisons and four treatments using 
a3 x 4 replicate factorial design. The survival times (unit, 10 hrs) of 
animals were recorded and the data are given as follows. 


Treatment 
Poison A B C D 


l 0.31 0.82 0.43 0.45 

0.45 1.10 0.45 0.71 

0.46 0.88 0.63 0.66 

0.43 0.72 0.76 0.62 

ll 0.36 0.92 0.44 0.56 
0.29 0.61 0.35 1.02 

0.40 0.49 0.31 0.71 

0.23 1.24 0.40 0.38 

Hl 0.22 0.30 0.23 0.30 
0.21 0.37 0.25 0.36 

0.18 0.38 0.24 0.31 

0.23 0.29 0.22 0.33 


Source: Box and Cox (1964). Used with per- 
mission. 


(a) State the model and the assumptions for the experiment. Assume 
that both poison and treatment factors are fixed. 

(b) Analyze the data and report the analysis of variance table. 

(c) Does the poison type affect the survival time? Use a = 0.05. 

(d) Does the treatment affect the survival time? Use a = 0.05. 

(e) Is there a significant interaction effect? Use a = 0.05. 

13. Scheffé (1959, pp. 140-141) reported data from an experiment de- 
signed to study the variation in weight of hybrid female rats in a foster 
nursing. A two-factor factorial design was used with the factors in 
the two-way layout being the genotype of the foster mother and that 
of the litter. The weights in grams as litter averages at 28 days were 
recorded and the data are given as follows. 
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Genotype of Foster Mother 


Genotype 
of Litter A F / J 

A 61.5 55.0 52.5 42.0 
68.2 42.0 61.8 54.0 
64.0 60.2 49.5 61.0 
65.0 52.7 48.2 
59.7 39.6 
F 60.3 50.8 56.5 51.3 
51.7 64.7 59.0 40.5 

49.3 61.7 47.2 

48.0 64.0 53.0 

62.0 

! 37.0 56.3 39.7 50.0 
36.3 69.8 46.0 43.8 
68.0 67.0 61.3 54.5 

55.3 

55.7 
J 59.0 59.5 45.2 44.8 


57.4 52.8 57.0 51.5 
54.0 56.0 61.4 53.0 
47.0 42.0 

54.0 


Source: Scheffé (1959, p. 140). Used with permission. 


(a) State the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table using 
the unweighted means analysis. 

(c) Analyze the data and report the analysis of variance table using 
the weighted means analysis. 

(d) Perform appropriate F tests, using the unweighted and weighted 
means analyses. Use a = 0.05. 

(e) Compare the results from the weighted and unweighted means 
analyses. 

14. Davies and Goldsmith (1972, p. 154) reported data from an experi- 
ment designed to investigate sources of variability in testing strength 
of Portland cement. Several small samples of a sample of cement 
were mixed with water and worked for a fixed time, by three differ- 
ent persons (gaugers), and then were cast into cubes. The cubes were 
later tested for compressive strength by three other persons (break- 
ers). Each gauger worked with 12 cubes which were then divided 
into three sets of four, and each breaker tested one set of four cubes 
from each gauger. All the testing was done on the same machine 
and the overall objective of the study was to investigate and quan- 
tify the relative magnitude of the variability in test results due to 
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individual differences between gaugers and between breakers. The 
data are shown below where measurements are given in the original 
units of pounds per square inch. 


Breaker 
Gauger 1 2 3 
1 5280 5520 4340 4400 4160 5180 
4760 5800 5020 6200 5320 4600 
2 4420 5280 5340 4880 4180 4800 
5580 4900 4960 6200 4600 4480 
3 5360 6160 5720 4760 4460 4930 


5680 5500 $620 5560 4680 5600 


Source: Davies and Goldsmith (1972, p. 154). Used with permission. 


(a) Describe the model and the assumptions for the experiment. 
Would you use Model I, Model I, or Model III. In the origi- 
nal experiment, the investigator’s interest was in these particular 
gaugers and breakers. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in testing strength due to 
gaugers. Use a = 0.05. 

(d) Test whether there are differences in testing strength due to break- 
ers. Usea = 0.05. 

(e) Test whether there are interaction effects between gaugers and 
breakers. Use a = 0.05. 

(f) Assuming that the gauger and breaker effects are random, esti- 
mate the variance components of the model (point and interval 
estimates) and determine their relative importance. 


Three-Way and 
Higher-Order Crossed 
Classifications 


5.0 PREVIEW 


Many experiments and surveys involve three or more factors. Multifactor lay- 
outs entail data collection under conditions determined by several factors 
simultaneously. Such layouts usually provide more information and often can be 
even more economical than separate one-way or two-way designs. The models 
and analysis of variance for the case of three or more factors are straightforward 
extensions of the two-way crossed model. The methods of analysis of variance 
for the two-way crossed classification discussed in the preceding two chapters 
can thus be readily generalized to three-way and higher-order classifications. 
In this chapter, we study the three-way crossed classification in some detail 
because it serves as an illustration as to how the analysis can be extended when 
four or more factors are involved. Generalizations to four-way and higher-order 
classifications are briefly outlined. 


5.1 MATHEMATICAL MODEL 


Consider three factors A, B, and C having a, b, and c levels, respectively, and 
let there be n observations in each of the abc cells of the three-way layout. Let 
yijxe be the €-th observation corresponding to the i-th level of factor A, the j-th 
level of factor B, and the k-th level of factor C. Thus, there is a total of 


N =abcn 


observations in the study. The data involving a total of N =abcn scores yjjxe's 
can then be schematically represented as in Table 5.1. 

The notation employed here is a straightforward extension of the two-way 
crossed classification. As usual, a dot in the subscript indicates aggregation and 
a dot and a bar indicate averaging over the index represented by the dot. Thus, 
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we employ the following notations for sample totals and means: 


and 


n 
Yijk. = S— vijee: 
l= 
Cc n 
Yij.. = y y Vijke» 
k=1 @€=1 
b n 
Vik. = y y Vijkes 


j=l =! 
a n 

Y.jk. = y ) Vijkes 
i=1 €=1 
b Cc n 

i = y y y Vijkes 
j=l k=1 €=1 


Vj. = > > > Vijke» 


Vijk 


= vijk./N; 


ij. = ij../Cn,; 


ik. = Yir./bn; 


ik. = Yjx./an; 


.. = Jy.../ben; 


;. = yj../acn,; 


k. = Y.x./abn, 


The analysis of variance model for this type of experimental layout is given 


as 


where 


is the general mean, 
a; is the effect of the i-th level of factor A, 
6; is the effect of the j-th level of factor B, 
Vx 1S the effect of the k-th level of factor C, 
(aB)i;, (@Y ik, (BY) jx are the effects of the two-factor interactions A x B, 
A x C, and B x C, respectively, 
(aBy)j jx 18 the effect of the three-factor interaction A x B x C, and 
€;;x 18 the customary error term. 


Vijke = UW +0; + Bj + YX + (HB); + (OY diz 
+ (BY )jk +(OBY Dijk + eijke 
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5.2 ASSUMPTIONS OF THE MODEL 
The assumptions of the model (5.1.1) are as follows: 


(1) @jjxe’S are uncorrelated and randomly distributed with common mean 
zero and variance o. 


(11) Under Model I, the a;’s, B;’s, ye’s, (@B)ij’s, (@Y)ix’s, (BY) jx’S, and 
(aBy )ijx’S are constants subject to the restrictions: 


a b Cc 
>> a; = Y- Bi = >> = 0, 
i=l j=l k=l 
a b a Cc 
Y- @B)ij = >) @B); = D> Vin = DS @Yix 
i=l j=l i=l k=l 


b c 
= > (By) jk = > (BY) ix = 9, 
j=l 


k=} 


and 


a b Cc 


Dd, BY ijt — > BY ijn — SY @BY ijk = 0. 


i=! j=l k=1 


(111) Under Model II, a;’S, B;'S, VS; (a@B);;’S, (ay )ix’S, (BY) jk’S; (aBy )ijk’S, 

and é;;x¢’s are mutually and completely uncorrelated random variables 
; 2,2 ,2 ,2 2 ,2 
with mean Zero and respective variances Oy,0%,0), O48, Sy,» FB, 

Oupy? and o>. 

(iv) Under Model III, several variations exist depending upon which factors 
are assumed fixed and which random. Suppose that factor A has fixed 
effects and factors B and C have random effects. In this case, a@;’s 
are constants; B;’s, ve’s, (@B)ij’S, (@Y )ix’S, (BY) jx’S, (ABY)ijx’S, and 
€ijxe’S are random variables with mean zero and respective variances 


2.2 2 .2 ,2 2 2 nh; “tions: 
05,97, 048, Syy1FB,> Top, and o; subject to the restrictions: 


Sa, — 0 (5.2.1) 
i=] 
YB); = DS @Vin =D OBY)ijx =9, 5.2.2) 
i=l i=] i=] 


for all j and k. 


Note that all interaction terms in Model III are assumed to be random, since 
at least one of the factors involved is a random effects factor. Furthermore, 
the sums of effects involving the fixed factor are zero when summed over the 
fixed factor levels. The correlations between random effects resulting from 
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restrictions (5.2.2) can be derived, but are not considered here. Other mixed 
effects models can be developed in a similar fashion. For example, a model 
analogous to the two-way mixed model discussed in Section 4.21 involves 
the restriction (5.2.1) but not (5.2.2). This implies that the random effects are 
mutually and completely uncorrelated random variables.! 


5.3. PARTITION OF THE TOTAL SUM OF SQUARES 
As before, the total sum of squares can be partitioned by starting with the 
identity: 
Yijee — Y... = Wi. — VOI +O. — VD AO wd) 
+ (Vij. — Vi. — Vj. + Yd 
+ (Vik. — Vi. — Vik +...) 
+ (Vj. — VG. — Vouk. + YD 
+ (Vijk. — Vij. — Vik. — Yijk. ti. + Yj. + Vk AY...) 
+ (Vijne — Vijr.)- (5.3.1) 


Squaring each side and summing over i, j, k, and £, and noting that the cross- 
product terms drop out, we obtain 


SSr = SS4 + SSp + SSc + SSaep + SSac + SSac + SSasc + SSe, 


where 
a Cc n 


b 
SS; = 3 - (vijne — 9.) 
i=l j=l k 


=1 ¢=1 


SS4 = ben ) (yi. — 9... 
i=] 


b 
SSg =acn) (95. - 5...) 
j=1 


abn > (4~.—- 5.) 


a b 
SSap = "2 Gu. i. — 


SSc 


~ 
+ 
Ne 
uw” 
bo 


! For a more general formulation of the three-way mixed model as an extension of the two-way 
mixed model by Scheffé, see Imhof (1960). 


286 The Analysis of Variance 


SSac = bn > Yk. — Fi. — Fk +I 
i=l k=l 


b Cc 
SSac =an 3 YO. ik — Vij. Vik. + ys 
j=l k=l 
a b Cc 
SSasc =n > y- > ine. — Vij. — Vik. — Vijk. t Vi tj. +k yw), 
i=l j=l k=l 
and 
a b Cc n 
SSz = y° (vijke — ijn.) - 


Here, SS7 is the total sum of squares; SS,, SSg, SSc are the usual main 
effects sums of squares; SS4g,SSac,SSzc are the usual two-factor interaction 
sums of squares; SS, gc is the three-factor interaction sum of squares; and SS_ 
is the error sum of squares. 


Remark: In a three-way crossed classification, one can compute abc separate cell 
variances as eH 1 (vijke — Yi jk” which can then be tested for homogeneity of variances 
(see Section 2.21). 


5.4 MEAN SQUARES AND THEIR EXPECTATIONS 


As usual the mean squares are obtained by dividing the sums of squares by the 
corresponding degrees of freedom. The degrees of freedom for main effects and 
two-factor interactions sums of squares correspond to those for the two-way 
classification. The number of degrees of freedom for the three-factor interaction 
is obtained by subtraction and corresponds to the number of independent linear 
relations among all the interaction terms (@By)j;x’s. 

The expected mean squares are obtained in the same way as in the earlier 
derivations.* The results on the partition of the degrees of freedom and the sum 
of squares, and the expected mean squares are summarized in the form of an 
analysis of variance table as shown in Table 5.2. 


5.5 TESTS OF HYPOTHESES: THE ANALYSIS OF 
VARIANCE F TESTS 


By assuming the normality of the random components in model (5.1.1), the 
sampling distributions of mean squares can be derived in terms of central and 


2 One can use the algorithms formulated by Schultz (1955) and others to reproduce the results on 
expected mean squares rather quickly. See Appendix U for a discussion of the rules for finding 
expected mean squares. 
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noncentral chi-square variables. The results are obvious extensions of the 
results given in Section 4.5 for the two-way classification. The tests for main and 
interaction effects can be readily obtained by the results of the sampling distri- 
butions of mean squares and their expectations. In the following we summarize 
the tests for fixed, random, and mixed effects models. 


MODEL | (FIXED EFFECTS) 


We note from the analysis of variance Table 5.2 that MS,, MSg, MSc, MSaz, 
MSac, MSzgc, and MSagzc all have expectations equal to o? if there are no 
factor effects of the type reflected by the corresponding mean squares. If there 
are such effects, each mean square has an expectation exceeding o7. Also, 
the expectation of MSzg is always a2 as was the case in the analyses of other 
models. Hence, the tests for factor effects and their interactions can be obtained 
by comparing the appropriate mean square against MSz; the large value of 
the mean square ratio indicating the presence of the corresponding factor or 
interaction effect. The development of various test procedures follows the same 
pattern as in the case of two-way classification. In Table 5.3, we summarize 
the hypotheses of interests, corresponding test statistics, and the appropriate 
percentiles of the F distribution. 


Remarks: (i) An examination of Table 5.2 reveals that if n = 1 there are no degrees 
of freedom associated with the error term. Thus, we must have at least 2 observations 
(n > 2) in order to determine a sum of squares due to error if all possible interactions 
are included in the model. For further discussion of this point, see Section 5.10. 

(ii) If some of the interaction terms are zero, one may consider the possibility of 
pooling those terms with the error sum of squares. However, as discussed earlier in 
Section 4.6, the pooling of nonsignificant mean squares should be carried out with 
a great deal of discretion and not as a general rule. The pooling should probably be 
restricted to the mean squares corresponding to the effects that from prior experience 
are unlikely to yield significant results (not expected to be appreciable). 

(iii) As in the case of two-way crossed classification, it may be of interest to consider 
the significance level associated with the experiment as a whole. Let aj, a2, ..., a7 be 
the significance levels of the seven F statistics, F4, Fp,..., Fasc, respectively, and 
let a be the significance level comprising all seven tests. Then again it follows that 
a <1—T1/_,(1 —a;). For example, if a} =a2 = --- =a7 =0.05, then w < 0.302; and 
ifa) =a, = --- =a7=0.01, then a < 0.068. 


MODEL II (RANDOM EFFECTS) 


The appropriate test statistics for various hypotheses of interest can be deter- 
mined by examining the expected mean squares in Table 5.2. However, for the 
first time, we encounter the difficulty that even under the normality assump- 
tion exact F tests may not be available for some of the hypotheses usually 
tested. There is no difficulty about testing the hypotheses on the three-factor 
or the two-factor interactions. Thus, from the expected mean square column of 
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TABLE 5.3 
Tests of Hypotheses for Model (5.1.1) under Model I 
Hypothesis Test Statistic Percentile 
H$ : alla; =0 MS 
versus Fa=T F[a—1, abc(n — 1); 1-—a] 
HA : not alla; = 0 E 
Hy : all B; =0 MS 
versus Fp= nis F[b—1, abc(n— 1);1-@] 
H? : not all B; = 0 E 
HE : all yz, =0 MS 
versus Fo=—-£ ss Fle—1, abe(n —1);1 —@] 
Hy, :notall y, =0 
HS : all (aB);; = 0 MS 
versus Fap= vs F{(a— 1)(b—1), abe(n — 1);1 — a] 
Hf® : not all (af); ; = 0 E 
HEC all (ay);, =0 MS 
versus Fac= i F[(a— 1)(c — 1), abe(n — 1); 1 —@] 
HAC : not all (ay);, =0 E 
Hp : all (By) j~ =0 MS 
versus Fec= vr F[(b— 1)(c — 1), abe(n —1);1 — a] 
HBC : not all (BY) jk = 0 E 
ABC . —— 
HBC : all (aBy); jx = 0 MSapc 


versus FaBc = —_—s— F(a — 1)(6 — 1)(c — 1), abc(n — 1); 1 — a] 
ABC . MSE 
Ay :notall (@By); i, = 0 


Table 5.2, we see that the hypothesis Hj'7° : Oc py =0 versus H/*?° : Onpy > 0 


can be tested with the ratio MS 4gc/MSz; and H;'? : O%p = Oversus H/'? : O%p > 


O with MS4g/MS,zc,andso on. Now, suppose that we wish to test H;' : 02 = 0 
versus Hj‘: 02 > 0 (cases Hj’ and H¥ can, of course, be treated similarly). If 
we are willing to assume that 07, = 0, then an exact F test of Hj‘ can be based 
on the statistic MS,/MSzac. In this case, SS,4g could be pooled with SS,jgc 
since they would have the same expected mean squares. Similarly, if we are 
willing to assume that o/,, = 0, we may test Hj! with MS,/MSaz and pool 
SSac with SSagc. Furthermore, if we are willing to assume other variance 
components to be zero, there would be no difficulty in deducing exact tests, 
if any, of the standard hypotheses and pooling procedures obtained from the 
analysis of variance Table 5.2, by deleting in it the components assumed to be 
zero. However, if we are unwilling to assume that 02, = Ooro2, = 0, then no 
exact test of Hj‘ can be found from Table 5.2. 


An approximate F test of H;' can be obtained by using a procedure due 
to Satterthwaite (1946) and Welch (1936, 1956). (For a detailed discussion 


Y 
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of the procedure, see Appendix K.) To illustrate the procedure for testing the 
hypothesis | 


versus (5.5.1) 


we note from Table 5.2 that 
E(MS,z) + E(MSac) — E(MSagc) = 02 + cng + bri, + NO py 


which is precisely equal to E(MS,) when o2 = 0. Hence, the suggested F 


(04 
statistic 1S 


MS, 


=A (5.5.2) 
MS ae + MSac — MSaac 


Fa 


which has an approximate F distribution with a — 1 and v, degrees of freedom, 
where v, 1s approximated by 


(MSas + MSac — MSazc)* 
Yq = 48 7 ac ~ ape’ 65,5,3) 
(MS 42) (MS 4c) (MS 48c) 


(a—1\(b-1) (a-—1)e-1) (a-1)6-1)\(e—-1) 


Remarks: (i) Because of the lack of uniqueness of the approximate F ratio (different 
F ratios may result from the use of different linear combinations of mean squares) and 
because of the necessity of approximating the degrees of freedom, the procedure is of 
limited usefulness. However, if used with care, the test procedure can be of value. The 
reader is referred to Cochran (1951) for a detailed discussion of this problem. 

(ii) Usually, the degrees of freedom given by (5.5.3) will not be an integer. One can 
then either interpolate in the F distribution table, or round to the nearest integer. In 
practice, the choice of the nearest interger will be more than adequate. 

(iii) An alternative test statistic for testing the hypothesis (5.5.1) is 


Fl MSa + MSasc 
A MSap + MSac 


The approximate degrees of freedom for both the numerator and the denominator are 
obtained as in (5.5.3) using Satterthwaite’s rule. Because of the need to estimate only 
the denominator degrees of freedom, the test criterion (5.5.2) might be expected to 
have better power but it suffers from the drawback that the approximation (5.5.2) is 
less accurate when the linear combination of mean squares contains a negative term. 
Moreover, the denominator of the test statistic (5.5.2) can assume a negative value. The 
problem, however, may be less important if the contribution of MS 4, is relatively small 
and the corresponding degrees of freedom are large. The reader is referred to Cochran 
and Cox (1957), Hudson and Krutchkoff (1968), and Gaylor and Hopper (1969) for some 
further discussions and treatment of this topic. The general consensus seems to be that 
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the two statistics are comparable in terms of size and power performance under a wide 
range of parameter values (see, e.g., Davenport and Webester (1973); Lorenzen (1987)). 

(iv) An alternative to the Satterthwaite approximation for estimating the degrees of 
freedom for F4 and F’, has been proposed by Myers and Howe (1971), but the procedure 
has been found to provide a liberal test (Davenport (1975)). 


(v) An alternative to an approximate F test of Hs 02 = () has been proposed by 


Jeyaratnam and Graybill (1980) which is tied to the lower confidence bound of a2. For 
a discussion of some other test procedures for this problem, see Naik (1974) and Seifert 
(1981). Birch et al. (1990) and Burdick (1994) provide results of a simulation study to 
compare several tests for the main effects variance components in model (5.1.1). 


To obtain a procedure for testing the hypothesis 


Hy Of =0 
versus 
H? OB > 0 


interchange A and B (also a and D) in (5.5.2) and (5.5.3). Similarly, for the 
hypothesis 


versus 
Hy: a, > 0 


interchange A and C (also a and c) in (5.5.2) and (5.5.3). 

Finally, similar to Table 5.3 for Model I, the hypotheses of interests, corre- 
sponding test statistics, and the appropriate percentiles of the F distribution are 
summarized in Table 5.4. 


MODEL III (MIXED EFFECTS) 


Suppose that A is fixed and B and C are random. The approximate F test given 
by (5.5.2) is used to test 


Ho:a; =0, i=1,2,...,a 
versus 
H, : not all a@;’s are zero. 


The other six F ratios test the following principal null hypotheses: 


2 2 2 2 2 2 
og =0, o* = 0, Cup = 9, oo”, =), og, = 0, and Oupy = 9. 


The results are summarized in Table 5.5. If B is the fixed factor and A and C 
are random, then interchange A and B (also a and bD) in Table 5.5. Similarly, if 
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TABLE 5.4 

Tests of Hypotheses for Model (5.1.1) under Model Il 
Hypothesis Test Statistic* Percentile 
HS : o2 = 0 


MSa 


F,=———_—__—-——_ Fla —1,v4;1-—a] 
MSas + MSac — MSasc 


versus 


H® :o2=0 
versus. F, = ————es F[b—1, 31 —a] 
B 2 MSas + MSzgc — MSasc 
Hy, 10g > 0 
Ho :o2=0 
versus. F.=——__MBe_ F[c—-—1,v,.31-—a] 
a MSac + MSgc — MSasc 
A, 1 Oy > 0 
HE? - 02, =0 Ms 
versus Faz =-—— Fi(a — 1)(6 — 1), (a — 1)(b — 1c — 1);1 —@] 
AB . ~2 MS asc 
Ay” : Oup > 0 
HEC Oey =0 
MS 
versus Fac = Mis AC F((a — 1)(c — 1), (a — 1)(b — 1)(c — 1);1 — a] 
HY : oR, >0 ABC 
HEC : Opy = MS 
versus Fac= BC F{(b — 1)(c — 1), (a — 1)(b — 1)(c — 1); 1 — @] 
BC . 2 MSaac 
Hy”: Of, > 0 
HABC . 62 =O 
0 Papy MSasc 
versus Fiasc = ——— F{(a — 1)(b — 1)(c — 1), abc(n — 1); 1 -— @] 
HABC - 62, sO MSe 
* “apy 


* For the test statistics F4, Fg, and Fc, the denominator degrees of freedom vg, vp, and v¢ are 
obtained using the formula (5.5.3) and its obvious analogues for vp, and ve. 


C is fixed and A and B are random, then interchange A and C (also a and c) in 
Table 5.5. 

Next, suppose A is random and B and C are fixed. The results of all the 
principal hypotheses of interest are summarized in Table 5.6. Note that all the 
F tests are exact and no approximate tests are necessary. If the random factor 
is B, interchange the role of A and B (also a and b) in Table 5.6. If the random 
factor is C, interchange the role of A and C (also a and c) in Table 5.6. 


5.6 POINT AND INTERVAL ESTIMATION 


No new problems arise in obtaining unbiased estimators of variance components 
for random effects factors or in the estimation of contrasts for fixed effects 
factors in Models I or III. Confidence limits for contrasts for fixed effects 
factors are constructed by using the mean square employed in the denominator 
of the test statistic while testing for the effects of that factor. The degrees of 
freedom correspond to the mean square used in the denominator. The results 
on point estimation for Model I are summarized in Table 5.7. 
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TABLE 5.5 
Tests of Hypotheses for Model (5.1.1) under Model III 
(A Fixed, B and C Random) 


Hypothesis Test Statistic* Percentile 


Hé : alla; =0 
versus 
HA : not alla; — 0 


MS, 


Fy = 4 Fa = 1, ys - 
A MSagsp + MSac — MSasc la “a a] 


He : of =0 
MSs 
versus Fe= 55 F(b—1,(b —1)(c — 1); 1 -—a@] 
HP : of >0 BC 
Ho 2 _ 
0° MSc 
versus Fo= a5 Fl(c — 1), (b —1)(c — 1); 1 — a] 
HE : 0, >0 BC 
Hg? : 02, =0 
versus FaB = A65 F{(a — 1)(b — 1), (a — 1)(b — 1)(e — 1); 1 — @) 
HA8 On, > 0 ABC 
AC. ,2 _ 
Ay 3 Og, = MSac 
versus Fac = M55 F((a — 1)(c — 1), (a — 1)(b — 1)(c — 1);1 — @) 
HAC -02, >0 ABC 
: Ony 
BC. ,2 _ 
Hy~: of, = 0 MSpc 
versus Fac = MS F[((b — 1)(c — 1), abce(n — 1); 1 — a) 
HBC Of > 0 E 
* “By 
ABC. 72 _ 
Hy”  : Og, =9 MS age 
versus Fasc = MS F((a — 1)(b — 1)(c — 1), abc(n — 1); 1 — @) 
HAs : Ors > 0 E 
: Oxpy 


* For the test statistic F4, the denominator degrees of freedom vg is determined using the formula 
(5.5.3). 


An exact 100(1 — a) percent confidence interval for a2 1S 


abc(n — 1)MSgz , abc(n — 1)MSe 
TTT SS ST (5.6.1) 
x“[abce(n — 1), 1 — a/2] x“Labc(n — 1), a@/2] 
Confidence intervals for other parameters under Model I are obtained from 
the corresponding items in columns (2) and (3) of Table 5.7. For example, for 
making pairwise comparisons, Bonferroni intervals are given by 


2MS¢£ _ 7 2MSeE 
< Oj — Oy < Vi. — Yr +6 


~l—aq@, 
ben Ft ben * 


(5.6.2) 


6 =tlabc(n — 1), 1 —a/2m] 
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TABLE 5.6 
Tests of Hypotheses for Model (5.1.1) under Model III 
(A Random, B and C Fixed) 


Hypothesis Test Statistic Percentile 
HA . o2 —0 
0 "% MS 
versus F,= Se Fla —1, abe(n—1);1-@] 
HA : a2 >0 E 
B. — 
Hg :all pj; =0 MS p 
versus FB= 356 F(b—1,(a—1)\(b—-1);1-a] 
HP : not all B; =0 AS 
C 
Hy :ally, =0 
0 MS 
versus Fo= Te C  Ffe-1,(a—1)(e-1);1-a] 
He :notall y, =0 AC 
HAB -g2, =0 
0 . ap MS 
versus Fap= iS F{(a — 1)(b — 1), abe(n — 1);1 — a] 
AB. 2 E 
A; : OnR > 0 
HAC - 62. =0 
0 Cay _ MSac . 
versus Fac= F{(a — 1)\(c — 1), abc(n — 1); 1 -—a@] 
HAC - 62 50 MSE 
1 ‘Cay 
HB© : all (By) jx = 0 Ms 
versus Feo = — P= OF(b- 1c — 1), (a — D(H We — 1); 1 — a] 
BC MS 4BC 
H, ~ :notall (By) j, =0 
HABC .42, =0 
0 oBy _ MSasc 
versus FaBC= F{(a — 1)(b — 1)(c — 1), abc(n — 1); 1-@] 
HABC 52) 59 MSE 
1 " “apy 


is the t value with m being the number of intervals constructed. Similarly, 
100(1 — a) percent Bonferroni intervals for the contrast 


L= Sia; (yr = o 
i=] i=] 


are determined by 


MSe 
bcn 


MS; < 
— \ > é?. (5.6.3) 


a 
yi <L<L+é 
ben 


i=l i=l 

Under Model III, with A fixed and B and C random, one 1s typically interested 
in contrasts of the form )>7_, £ja;()_;_, £; = 0); however, no exact interval for 
a linear contrast is available. To see this note that }“7_, £:@; = >-j_, €: 54... with 
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TABLE 5.7 
Estimates of Parameters and Their Variances under Model I* 
Parameter Point Estimate Variance of Estimate 
~ 2 
sad Yue. ag /abcn 
aj We. Yi. (a — l)o2/abcn 
b+ aj Yi... o¢/ben 
+a; + Bj + (a8); Vij. af/cn 
M+a;+By+y+(aB)ijy — Vijk. of /n 
+ (ay )ik + (BY) jk 
+ (aBy )ijk 
a; — a; Vi... — Wi... 202/ben 
a a a a 5 5 
Sei; (34 -0) De Gi... (4) ae /ben 
i=] i=] i=] i=l 
(a8); ; Vij. — Vi. — Ij +S... (a — 1)(b — 1)02/aben 
(aBy)i jx Vijk. — Vij. — ik. — Yi jk. (a — 1)(b — 1)(c — 1)02/aben 


+ Yi FY FH YLK. AY... 
a2 MSE 207 /abc(n — 1) 


* Estimates for B j H+ B;, (@y)jx, and other parameters not included in Table 5.6 can be obtained 
by interchanging appropriate subscripts in the estimates shown in the table. 


Var(> 5-1 fi 91.) = Ol ?)(02 + NO cp, + cno gg +bnoj, )/(ben). Thus, Var 
(>-5_, 4 yi...) cannot be estimated using only a single mean square in the analy- 
sis of variance Table 5.2. Approximate intervals can be based on Satterthwaite 
procedure and a method due to Naik (1974). For further discussion of these 
and other related procedures including a numerical example, see Burdick and 
Graybill (1992, pp. 156-160). If two of the effects are fixed and one random, 
we have seen that exact tests exist for all the hypotheses of interest. Thus, 
exact intervals can be constructed for all estimable functions of fixed effect 
parameters. For example, with A and C fixed and B random, it can be shown 
that 


(a — l)o2 2 
SOB Oe (5.6.4) 


2 
Var(5j..) = — + 
_ b ab ben 


and 


Var(¥j... — Yi...) = —————. (5.6.5) 


296 The Analysis of Variance 


Remark: The results (5.6.4) and (5.6.5) can be derived as follows. First, since all the 
random and mixed terms are independent of each other so their cross-products will 
have expected value equal to zero. Further, it follows that Var{(@);;}= E [(aB);,] = 
[(a — l)/aloz, and E = [(@B);;(@B)j’ ;;] =9 for j #j’. Similar results hold for (By) jx 
and (@By)i jx. Now, E(yj...) = « +0; and Var(yj;..__) is given by 


Var(¥;...) = ELyi... — w — aj]° 
1 b Cc n 


E bon » > CF + (@B)ij + VE + (OY dik 


j=l k=) ¢=1 


2 


+ (By )jk + (QBY ijk + eijxe) 


l b l b l b Cc n 2 
=— £E 5 Bi + 5 a OB) + ben 222 Cj jke 
_ 6b 4, (a-1)b 4 ben 4 
— 52° ab. 8 * b2c2p2e 
2 2 
b ab bcn 
Similarly, since E[(@B)ij(@B)i'j] = —o3,/a and E[Y...— yr.) = a- ay, 
Var(Yj... — Yi...) iS given by 
Var(5j... — Ji...) = EDK... — a) — Fv.. tay? 
] b ] b l b Cc n 
=El 5 OB — 5 Br +5 De De Deine 
j=l j=l j=l k=1 €=1 


2 
Cc 


] b n 
LEY Vee 
ben 2 pai a1 


b b 
2E ( A) (s Aes) 
2a—-1)b , j=} =| n 20? 


— be St __e. 
ab2 Pop b2 ben 


2(a — lose 2bose Io2 
ab + ab? ben 
2(o? + cno2, ) 


ben 


The analysis of variance estimates of the variance components are readily 
obtained from Table 5.2 by setting the mean squares equal to the expected 
mean squares and solving for the desired variance components. For example, 
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under Model III with A fixed and B and C random, the estimates of variance 
components are: 
6? =MSz 

6 ig, = (MSasc — MSz)/n, 
63, = (MSac — MSz)/an, 
By = (MSac — MSazc)/bn, 

ap = (MSag — MSazc)/cn, 
= (MSc — MSzc)/abn, 


and 
= (MSz — MSzc)/acn. 


For some results on confidence intervals for individual variance components 
and certain sums and ratios of variance components under Models II and III, 
including numerical examples, see Burdick and Graybill (1992, pp. 131-136, 
156-160). 


5.7 COMPUTATIONAL FORMULAE AND PROCEDURE 


Ordinarily, computer programs will be employed to perform the analysis of vari- 
ance calculations involving three or more factors. For completeness, however, 
we present the necessary computational formulae: 


SSr => » » Ysa = 


SS, = a > y2 — ye 
A ben — abcn 
1 2 y? 
SS — 2 _ a 
*  acn d Yj. abcn 
1 y? 
SS — —_ 2 __ eons , 
c abn > Yk. abcn 


NM 
NM 
> 
& 
II 
oP 
g|- 
Ms 
Me 
<< 
aN 
| 
S| 
xy — 
= 
~™ ON 
| 
S| 
oP 
= 
nw, NM 
-+- 
<< 


298 The Analysis of Variance 


SSec = — _ —— _ 
BC an » — Yk ~ Gen » J abn k- "-abcn 
j=l k=1 j=! k=1 
] a b Cc ] a 5 
2 
SSaBc = n » Vijk. — Tp » yi. md Yok. 
i=1 j=1 k=! i=1 j=l 
] b Cc ] a b 
2 2 2 
_ 4+ 24 
an dd, ¥ik cn d» acn Yi 
l< 2 y?. 
abn I Yak abcn 
and 
a b c n 1 a b c 
2 2 
SSze = > Yijke — n > Yijk 
i=1 j=l k=1 @=1 i=1 j=1 k=! 


Alternatively, SS- can be obtained from the relation: 
SSe = SSr — SS, — SSB — SSc — SSae — SSac — SSae — SSasc.- 


Remark: The preceding computational formulae can be readily extended if four or 
more factors are involved. In Section 5.11, we illustrate some of these computational 
formulae for the case of the four-factor crossed classification model. 


5.8 POWER OF THE ANALYSIS OF VARIANCE F TESTS 


Under Model I, the power of each of the F tests summarized in Table 5.3 can be 
obtained in the manner described for the one-way and two-way classification 
models. The normalized noncentrality parameter ¢ needed for calculating the 
power of each F test can be obtained as follows: 


$ ] [meee of Second Term in the Expected Mean Squares Column in Table = 1/2 


Oe Corresponding Degrees of Freedom + 1 


For example, for testing the null hypothesis that the three-factor interaction 
ABC 1s zero, we have 


1/2 


~ go, | (a— 1b —1(c—-1) +1 
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For the two-factor interaction AB, 


1/2 


a b 
cn), d (af); 

bas = i=1 j=l 
486. | (a—1(b—1) 41] ” 


and for the main effect A, 


and so on. 
In an analogous manner, under Model II, for testing the null hypothesis that 
the three-factor interaction ABC is zero, we have 


> 41/2 
NO sp, 
ABC = [ + re 


For the two-factor interaction AB, 


For testing the main effects, we have seen that no exact F tests are available. 
An approximate power of the pseudo-F tests discussed in Section 5.5 may, 
however, be computed (see, e.g., Scheffé (1959, p. 248)). 


5.9 MULTIPLE COMPARISON METHODS 


As before, multiple comparison methods can be utilized for the fixed as well as 
mixed effects models to test contrasts among cell means or factor level means. 
We briefly indicate the procedure for the fixed effects case. 

When the null hypothesis about a certain main effect or interaction is re- 
jected, Tukey, Scheffé, or other multiple comparison methods may be used to 
investigate specific contrasts of interest. For example, if H;‘?° is rejected, we 
may be interested in comparing the contrasts of cell means 


ijk = +0; + By + VE + (OB); + ON Din + (BY) jk + COBY Di jx: 
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Then multiple comparison methods may be used to investigate the general 
contrasts of the type 


a b c 
L= > > Y > lijk ijt: 
i=] j=l k=] 


where 


i-1 j=l k=1 
An unbiased estimator of L 1s 
a b Cc 
L= y ijk Vijk.» 
i=) j=l k=! 
for which the estimated variance 1s 
E a b Cc 
—™ oA _ 4 
Var(L) = ) e; ik 
i=1 j=l k=l 


For the Tukey’s method involving pairwise comparisons, we have 
T = qlabc, abc(n — 1);1-— a]. 
For the Scheffé’s method involving general contrasts, we would have 
S? = Flabe — 1, abc(n — 1);1 — @]. 


Furthermore, if H3 is rejected, one may proceed to investigate contrasts 
involving a@;’s of the form 


L= S £50; , 
i=] 


where 


Again, an unbiased estimator of L 1s 


a 
L= 3 li Yi... 
i=l 
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with estimated variance 
—~,. MSeG, 
Var(L) = —— | on 
(f) ben » ; 


For the Tukey’s procedure involving pairwise differences, we have 
T = qla,abc(n — 1);1—- a]. 
For the Scheffé’s procedure involving general contrasts, we would have 
S* = Fla — 1, abc(n — 1);1—a’]. 


Contrasts based on £;’s and y,’s can be investigated in an analogous manner. 


5.10 THREE-WAY CLASSIFICATION WITH ONE 
OBSERVATION PER CELL 


If there is only one observation per cell in model (5.1.1) (.e., 2 = 1), we can- 
not estimate the error variance o? from within-cell replications. In this case, 
analysis of variance tests can be conducted only if it is possible to make an 
additional assumption that certain interactions are zero. Usually, we would 
assume that there is no three-factor interaction A x B x C. If it is possible 
to assume that the A x B x C interaction is zero, then the corresponding mean 
square MS,zc has expectation o? and can be used as the error mean square 
MS, to estimate the error variance 07. However, this layout does not allow 
separation of the three-factor interaction term from the within-cell variation or 
the error term. 
The analysis of variance model in this case is written as 


i=1,2,...,a 
Vijk = M+ aj; + Bi + VE + (aB)i; j=1,2,...,b (5.10.1) 
+ (ay ik + (BY) jx + eije b-12.... ¢ | 


All sums of squares and mean squares are calculated in the usual manner except 
that now n = 1. The definitional and computational formulae for the sums of 
squares are: 


SSa = be Si, -y.y= - Sy — —y?, 
i=l j 


be a abc 
? 2 | , 2 ] 2 
SS = y;—y = —_ ; 
B=ac D (Vi. — Y..) - D Yi. The 
SS = ab 05 —¥ ——— y? _ y? 
.K wee ab .K abc wee? 


k=] k=1 
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l = : 2 I = 2 I 2 2 
arp Dp Dean ro Deere See ane 
i=] j=l i=] jJ=1 
a Cc 
SSac =D) Yi — Fi. — Fe +I 
i=] k=1] 
=5L Doe et ae 
Be aR be La gh Lak ae 
b c 
SSsc =a) jk — 9.5. —Fa tI, 
J=1 k=! 


lI 
Q | — 
Me 
Ne 
ww, 
o 
| 
Q 
a le 
N< 
~N 
| 
S| - 
M4: 
Ne 
ho 
o 
S| 
a 
S 
Ve 
2 ON 


jJ=1 k=!1 j=! k=] 
and 

a b c 

SSz = > (Vijk — Vij. — Vik — Dijk +I. FIG +I AIL 
i=] j=l k=1 
a b Cc ] a b ] a Cc ] b Cc 

2 2 2 2 
_ » Yijk ~ 7 » ip dy Dik vik 

as f ct 4 b 4 a + 
i=] j=l] k=] i=] j=l i=] k=1 J=1 k=!1 


be — Yi. ac ate ab L k abe. 


The resulting analysis of variance table is shown in Table 5.8. In the case of 
Model I, all mean squares are tested against the mean square for error (three- 
factor interaction). When the null hypothesis about a certain main effect or a 
two-factor interaction is rejected, Tukey, Scheffé or other multiple comparison 
methods may be used to investigate contrasts of interest. Under Models II 
and III, the appropriate test statistics for various hypotheses of interest can 
be determined by examining expected mean squares in Table 5.8. However, 
again, there are no exact F tests for testing the hypotheses about main effects 
under Model II. Pseudo-F tests discussed earlier in Section 5.5 can similarly 
be developed. 


5.11 FOUR-WAY CROSSED CLASSIFICATION 


The analysis of variance in a four-way classification is obtained as a straight- 
forward generalization of the three-way classification and we discuss it only 


S “SS | — 9qD [BIOL 
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"W(Av) ¢ << x 
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briefly. The model is given by 


Yijkem = UL +a; + Bj +e + de + (@B)ij + (OY dik =4 - _ 
+ (ad)ie + (BY) jx + (BS) je + (VS )ke /* a 
+ (OBY Dijk + (@BS)ijpe + (V9) ix Shr 
+ (BYS) jee + (@BYS)ijne + Cijre f=1,...,d 
J FT] Lykem m=1,...,n, 
(5.11.1) 


where @;;xem’S are independently and normally distributed with zero mean and 
variance o7. The assumptions on other effects can analogously be stated de- 
pending upon whether a factor is fixed or random. Note that the model equation 
(5.11.1) has 17 terms: a general mean, one main effect for each of the four fac- 
tors, six two-factor interactions, four three-factor interactions, one four-factor 
interaction, and a residual or error term. 

The usual identity y;;xem — Y..... =etc., contains the following groups of terms 


on its right-hand-side: 


(i) Estimates of the four main effects, for example, y;... — y...., which gives 
an estimate of a;. 

(11) Estimates of the six two-way interactions, for example, yj;;... — yi... — 
yj... + y...... which gives an estimate of (a@B); i 

(iii) Estimates of the four three-way interactions, for example, yj ;%.. — yij... — 
Vik. — Vijk.. + Vi... FY... FY.&.. — ..... which gives an estimate of 
(a@By); jk 

(iv) Estimate of the single four-way interaction, which will have the form 
Yijxe. — LY... + four main effects + six two-way interactions + four 
three-way interactions]. 

(v) The deviations of the individual observations from the cell means, for 


example, yijxem — Yijke.- 


The partition of the total sum of squares 1s effected by squaring and summing 
over all indices on both sides of the identity y;jxem — y..... = and so on. The 


typical sums of squares and corresponding computational formulae are: 


l . 2 I 2 
~ bcdn ae abcdn> weeee ’ 


a b 
SSap = cdn >> >> (Vij. — Ji. — Ij. FI 
i=1 j=l 
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a b c 
SSAaBc =dn) > >> 
j=l k=1 


ue (ee Vik. — V. jk. + Yi... + V5... Vk. veees y 


“(3 r SSp r SSc + as + SSac + SSzc), 


SSascp = — 1S > yee - abedn ———y* —(SS4 + SSz + SS¢ 


i=l j=l k=1 f=! 
+ SSp + SSap + SSac + SSap + SSpc+SSpp+SScep 
+ SSasc + SSacp + SSagp + SSacp), 


SSE = — » y > y > 01K - Vijne.)” 


= =l m= 


b c d n 


Q 


— 
= 
II 
| el 
Se 
II 
— 
= 
I 
— 
cS 
II 
— 


i=l j 


The degrees of freedom for the preceding sums of squares are a — 1, (a — 1) 
(b — 1), (a — 1)(b — 1)(c — 1), (2 — IH — 1)V(e — 1)(d — 1) and abcd(n — 1), 
respectively. The expected mean squares can be derived as before.* For example, 
under Model II, 


E(MS,) = a2 + NO opys + dno ig yt NO. g5 + bnoi.,s + cdno eg 
+ bdno? yt beno2,, + bcdno?, 
E (MSag) = 0; +noig,5 +dnozg, + cno xg; + cdno ig, 
E (MS asc) = a2 + NO ops + dno ig, 
E (MSascp) =o, + NO spys> 


and 


E (MSzg) = o?. 


Expected mean squares under Model III depend on the particular combination 
of fixed and random factors. For example, for an experiment with A and C fixed, 
and B and D random, (aBy5)j;jxe’s are assumed to be distributed with mean 


3 One can use the rules formulated by Schultz (1955) and others to reproduce the results on expected 
mean squares rather quickly. See Appendix U for a discussion of rules for finding expected mean 
squares. 


306 The Analysis of Variance 


zero and variance Olgy 3» subject to the restrictions that }°_, (By 5); ;,4, =O= 
> xa1 (@BY 5); ;,¢. The assumptions imply that 


eS > oars, | = (a —1)(c — 1) ogg,s, 


i=l k=1 
(a—1)(c-1) , 


2 
E [(oBy5); 40] = ac apy, 
E[(aBy5);jxe(@By5)y jee] =0, for j = j’,2 4 £' or both j ¥ j’ 
and £ 4 £', 


— ] 
El (@By9)ijxe(@BY 5); jxe] = — E 7 | ds i #i', 


—] 
E{(@By9)ijxe(@By 5)ijxe] = — E | oars kk’, 


and 


2 
On . of / 
El(oBy8)ijxe(OBy)i jee] ==, i Ai! and k AK. 


Now, the results on expected mean squares follow readily. Finally, for a given 
model — fixed, random, or mixed — the point and interval estimates and tests of 
hypotheses corresponding to parameters of interest can be developed analogous 
to results for the three-way classification. 


Remark: For a balanced crossed classification model involving only random effects, 
there are some simple rules to calculate the coefficients of the variance components in 
the expected mean square. The rules are stated as follows for a model containing four 
factors. 


(a) All expected mean squares contain oa? with coefficient 1. 

(b) The coefficient of a variance component is zero or abcdn divided by the product 
of the levels of the factors contained in the variance component. For example, 
the coefficient of Oxpy 3 1s equal to abcdn/abcd =n. 

(c) The coefficient of the variance component in the expected mean square of a 
main factor or interaction between factors is zero if the product of the levels 
of the factors contained in the variance component cannot be divided by the 
level of the factor or the product of the levels of the factors. For example, 
the coefficient of Of, in E(MSa) is zero since be cannot be divided by a. 
Similarly the coefficient of OB, 5 in E(MSaaz) is zero since bcd cannot be divided 
by ab. 

(d) A quick check on the correctness of the coefficients of variance components can 
be made by noting that for a given variance component, the weighted sum of 
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coefficients corresponding to all the mean squares, including that of the mean, 
is abcdn, where the weights are taken as the degrees of freedom. 


5.12 HIGHER-ORDER CROSSED CLASSIFICATIONS 


The reader should now be able to see how the four-way crossed classification 
analysis can be further generalized to five- and higher-order classifications. 
The formal symmetry in the sums of squares and degrees of freedom for the 
balanced case makes direct generalizations to the higher-order crossed clas- 
sification models quite straightforward. For example, a full p-way crossed 
classification involving p crossed factors contains 2? + 1 terms in the model 
equation: a general mean, one main effect for each of the p-factors, (5) two- 
factor interactions, ( 5 ) three-factor interactions, and so on; and the total number 
of main effects and interactions to be tested is equal to 2? — 1. Computational 
formulae given in the preceding section can be readily extended if more than 
four factors are studied simultaneously. However, when the number of factors 
is large, the algebra becomes extremely tedious and the amount of computa- 
tional work increases rapidly. Most of the algebra can be simplified by the 
use of “operators” as discussed by Bankier (1960a,b). The details on mecha- 
nization of the computational procedure on a digital computer can be found 
in the papers of Hartley (1956), Hemmerle (1964), and Bock (1963), and in 
the books by Peng (1967, pp. 47-50), Cooley and Lohnes (1962), and Dixon 
(1992). Hartley (1962) has suggested a simple and ingenious device of using 
a factorial analysis of variance without replication (with as many factors as 
necessary) to analyze many other designs on a digital computer, where data 
from any design are presented and analyzed as though they were a factorial 
experiment. 

There are several procedures for deriving expected mean squares in an anal- 
ysis of variance involving higher-order crossed classification models. They are, 
however, more readily written down following an easy set of rules. The in- 
terested reader is referred to papers by Schultz (1955), Cornfield and Tukey 
(1956), Millman and Glass (1967), Henderson (1959, 1969), Lorenzen (1977), 
and Blackwell et al. (1991), including books by Bennett and Franklin (1954), 
Scheffé (1959), and Lorenzen and Anderson (1993) for detailed discussions of 
these rules. A brief description of these rules is given in Appendix U. Finally, it 
should be stressed that in a higher-order crossed classification involving many 
factors, the complexity of the experiment as well as the analysis of data in- 
creases as the number of factors becomes large. In addition to providing a large 
number of experimental units, there are many interaction terms that must be 
evaluated and interpreted. Moreover, the tasks of evaluating the expected mean 
squares and performing the tests of significance for each source of variation also 
become increasingly complex. One common source of difficulty encountered 
in analyzing higher-order random and mixed factorials is that there is often no 
appropriate error term against which to test a given mean square. Frequently, 
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the appropriate tests are carried out using an approximate procedure due to 
Satterthwaite (1946). It should, however, be mentioned that although the num- 
ber of interaction terms in a higher-order classification increases rather rapidly, 
in many cases these interactions are so remote and difficult to interpret that they 
are frequently ignored and their sums of squares and degrees of freedom pooled 
with the residual. 

We outline below an analysis of variance for the r-way classification involv- 
ing factors, A}, A2,..., A,. Let a; be the number of levels associated with the 
factor A; (i = 1,2,...,7), and suppose there are n observations to be taken 
at every combination of the levels of A,, A2,..., A,. The model for a r-way 
classification can be written as 


i; =1,2,...,a 
Vizin..ips = M+ (01); +--+ + ri, + (O1O2)ii, H e+ | i575 =1,2,...,€ 
+ (Q,_1Or)i,_ yi, + (010203 )iyinz, °° 
+ (20,1 Qt, Vir rip yi, tres 


+ (0102... O, Diyini, H Cijin...i-s ip =1,2,...,a, 
s=1,2,...,n, 
(5.12.1) 


where Jj,i,..i,5 1s the s-th observation corresponding to the i-th level of Aj, 
i-th level of Az, ..., and i,-th level of A,;; —oo < pw < ois acconstant; (@;);, 
is the effect of the ij-th level of Aj (j = 1,2,...,1r);(@ja@x)i,i, 18 the ef- 
fect of the interaction between the i;-th level of A; and the i,-th level of 
Ax (j <k =1,2,...,17); (joe); ,i,i, 18 the interaction between the i ;-th level 
Aj, the ix-th level of Ax, and the ig-th the level of Ag (j < kK <£=1,2,..., 
r);...3(@ 02... @,)j,i,...;, 18 the interaction between the 7;-th level of Aj, the /- 
th level of A2, ..., and the i,-th level of A,; and finally e;,;,..;5 1s the customary 
error term. 

The usual identity y;,;,..i5 — 
terms on its right-hand side: 


= etc., contains the following groups of 


e Estimates of r main effects; e.g., ¥,;,.... — ¥.... which gives an estimate 
of (@;)i; G = |, 2, wee Ir). 
e Estimates of (5) two-way interactions; €.8., Vi jig... — Yipee Vuigewe 7 


y...... which gives an estimate of (@ ja, )ji,i, (i < kK =1,2,...,7). 


e Estimate of the single r-way interaction of the form yj,i,..i,. — LY... + 
r main effects + (5) two-way interactions +---+ (, )(r — 1)-way 
interactions]. 

e The deviations of the individual observations from the cell means; e.g., 


Vijin...t-s — Yiyiz...i,.- 
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The partition of the total sum of squares is effected by squaring and summing 
over all indices of the identity y;,;,..is — y..... = etc. The typical sums of squares 
can be expressed as follows: 


at 
SSa, = ana3...a,n > (Fi _ — ¥...)’, ete., 


a) a2 


SS4,A) = 4304...a,N n> X (Vi,i, eeeee ~~ Yi, eases ™ Y.iy eeeee + y eeeee y’, e(c., 


ij=1 i= 


a) a2 a, 
SSA, A2..4, = 0 so e . [(Vivinip, — Virinnipan Foe 
i=] n= i;=1 
+(-1! Giri... YA(-D Gin Fo) 
+(-1D'5. 7, 
and 
ay a2 a, n 
SSE = > yo. > Oni .. US — Yiji os .i,.)*. 

j=l n=l i,=1 s=l 


The degrees of freedom for the above sums of square are a, — 1, (a; — 1) 
(a, — 1),..., (a) — 1)(a@2 — 1)... (a, — 1) and ajaz...a,(n — 1), respectively. 

Under Model I, (@;);,(/ = 1,2,.--,7), @jan)ij, GG <k = 1,2,---,7), 
(O jOpOe)iipi, F< k <L= 1,2,...,r),..., and (@)@2...a,)j,;,..i, are assum- 
ed to be constants subject to the following restrictions: 


aj 
Yi), =0, f=1,2,...575 


ij=1 
aj ak 
’ (ji ji — ; (jk )i i, = 0, J <k = l, 2,..-13 
ij=l ip=l 
aj ak 
) (Oj OQ )i ii, = ) (0¢ j 00.060); i iy 
ij=] i=l 


ag 
— Y(@joKee)i,igi, =O, j<k<€=1,2,...,7; 


ig=1 
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and 


Qa\ a2 
Yd (@iey tte Or )isin...i, = Y (aia, see Or )itin...i, = 


i=] i2=1 


ar 
= > (@1Q2...Q,)ii,..i, = O- 


i=l 


Furthermore, @é;,;,..;,5’S are uncorrelated and randomly distributed with zero 
means and variance o7. The expected mean squares are obtained as follows: 


a 
Q703...a;n 
E(MSa,) = oa? + aol ) (1); , etc., 
1 ij=1 


2 Q3a4.. 
E(MSa,a,) = 0, + ———— (ay _ Na _ 1) » Y (a1a)?,,, etc 


ij=1 in=1 


n 
E(MS — 2 ——$—— 
(MS 4,4)...4,) = Oo + G@nol).G nd) 


x 7 Sy... Sea... On YP in ip 


ij=1 i= | i;=1 
and 
E(MSz) = o?. 


Assuming normality for the error terms, all tests of hypotheses are carried out 
by an appropriate F statistic obtained as the ratio of the mean square of the 
effect being tested to the error mean square. 

Under Model II, (@j)j,’s, (@jOx)i;i,’S, (@jOKOe)i,igig’S,-- +, (@102... 
Qt, )i,i,...i, S, and e;,;,.;,5’S are assumed to be mutually and completely uncorrec- 


ted random variables with zero means and variances 0f.,05 as cpap? **> 
O% ao,» and o? respectively. From (5.12.1), the variance of any observation 
is 
2 2 
Var(Yi,iz...i-s) = ~~ OZ + OZ + Oo wy +: “TO of 1Q, + i + Oo 07...0, + 0, ? 
2 2.42 20. eg? ; 2 
and, thus, Oy. +--+, O¢ 3S Q.a,9 +++» Sq, s0,3° ++) Sayay...a, and o;, are the variance 
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components of model (5.12.1). The expected value of the mean square corre- 
sponding to any source; for example, the interaction Aj, x Aj, x --: x Aj, 1S 
obtained as follows: 


2 y } 2 
Oe + n Dhy kok p? cg, Oty «kp ’ 


where the summation is carried over all the variance components except o?. 
The coefficients qx,4,.., Of the variance components are given by 


A1Gn Gr if (j1, ja, +++ jm) is a Subset of (ky, kz, ..., Kp) 
Wk ko...kp = AK, Ak, --- Ak, 
P 0, otherwise. 


Remark: In many factorial experiments involving a r-way classification where the 
levels of the factor correspond to a fixed measure quantity, such as levels of temperature, 
quantity of fertilizer, etc., the investigator is often interested in studying the nature of 
the response surface. For instance, she may want to determine the value at which the 
response surface is maximum or minimum. For discussions of the response surface 
methodology, the reader is referred to Myers (1976), Box and Draper (1987), Myers and 
Montgomery (1995), and Khuri and Cornell (1996). 


5.13 UNEQUAL SAMPLE SIZES IN THREE- AND 
HIGHER-ORDER CLASSIFICATIONS 


When the sample sizes in three- or higher-order crossed classifications are 
not all equal, the procedures described in Section 4.10 can be used with the 
customary modifications. The formulae for the two-way model need simply 
be extended for experiments involving three and more factors. However, the 
computation of the analysis of variance in the general case of disproportionate 
frequencies tends to be extremely involved. Various aspects of the analysis of 
nonorthogonal three- and higher-order classifications have been considered by 
a number of authors. For further discussions and details the interested reader is 
referred to Kendall et al. (1983, Sections 35.43 and 35.44) and references cited 
therein. 

In the following, we outline an analysis of variance for the unbalanced three- 
way crossed classification and indicate its extension to higher-order classifica- 
tions. The model for the three-way crossed classification remains the same as 
in (5.1.1), except that the sample size corresponding to the (i, 7, &)-th cell will 
now be denoted by nj;,. We consider the analysis when the unequal sample 
sizes follow a proportional pattern; that is, 


Nj 7 Nk 


Nijk = 
] N 
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For this case, the analysis of variance discussed earlier in this chapter can 
be employed with suitable modifications. For example, the definitional and 
computational formulae for the sums of squares are: 


a b Cc Nijk a b Cc Nijk 2 
a) 2 yi. 
SSr = DDD, Dice - 5. =D DD Yijke ~ Ay 
i=] j=l k=1 t=! i=] j=l k=l f=! 
a a 2 2 
_ _ yj y 
SS, = Doni... — 3. = DoE - =, 
i=l j=) Mi. N 
b b y*, y? 
SSp = onj(vj.-5.P = oS 
— —~ nn |. N 
j=l j=l J 
Cc Cc 2 2 
_ _ y y 
SSc =) n5e. — J. = DE oe, 
k=l kal [Ak 
a b 
SSap = SY) ni. ij. — 9. — 9.45.9 
i=l j=l 
Pin SY en is Ye 
= Me eh 
i=l j=l Nij j=) Mi j=l nj 
a Cc 
SSac = nin(Vir. — Vi. — Vk. + 9...) 
iz] k=] 
a Cc 2 a 2 Cc 2 2 
i.k Yi... k yo. 
— rik. Pi rok 4 Do 
i=l kal [ik don dons N 
b Cc 
SSac = 3 nj(¥.jk. — Vj. — Vk. YY 
j=l k=l 
bc y2 b 2 c 12 2 
jk yj k y 
_ — — —_ oh + — 
a b Cc 
- ~ - - ~ _ \2 
SSasc = > NijkVijk. — Yij.. — Vir. — Y.jk. + i... + Yj. + Yk. OY.) 
i=l j=l k=l 
. N;; 7 
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and 
a b Cc Nijk 
— 2 
SSE = y (Vijne — Yijk.) 
i=1 j=l k=) é=1 
a b c Nijk a b Cc 2 
_ 2 Yijk. 
= » } Yijke — > } We? 
i=1 j=l k=1 é=1 i=1 j=l k=1 © Lk 
where 
Nijk 
Yijk. = ) Vijke» Vik. = Yijk. /Mijks 
é=1 
C 
Vij. = ) Vijk.» Vij.. = Yij../Nij.» 
k=1 
b 
Vik. = ) Vijk> Vik. = Vir. /Niks 
j=l 
a 
Yjk. =) Vijk.s Y jk. = Y.jk./N jk; 
i=] 
b 
Yi. = y Yij..s Yi... =i... /Ni.., 
j=l 
a 
Vp = Din Vij. =Vj/N js 
i=l 
a 
ye =) Vike Yok. =Y.k./N ks 
i=l 
a b C 
y =) v=) yji=) Yk» Yi... =Yy../N, 
i=l j=l k= 
Cc b a 
nj, = y Nijk, Nik = y Nijk, Nj = ) Nijk, 
k=1 j=l i=l 
b a a 
nj = Nij.; nj=> Nij., ne=)> Nik, 
j=l i=1 i=l 
and 
a b C 
N= n=) nj =>dong 
i=l j=l k=1 


The analysis of variance with expected mean squares for the fixed effects model 
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TABLE 5.9 
Analysis of Variance for the Unbalanced Fixed Effects Model in 
(5.1.1) with Proportional Frequencies 


Source of Degrees of Sum of Mean Expected Mean 
Variation Freedom Squares Square Square 
1 a 
Due to A a—1 SS 4 MS, a2 + qo > nj..0? 
a—1 j= 
1 b 
Due to B b-1 SS MS3 of + b-1 15 BF 
“15 
1 c 
Due to C c-—1 SSc MSc o2 + c- 1 > nig vy; 
“4 k=] 
] 
Interacti —1)(b-1 SS MS 2 4 —________ 
merac ion (a — 1) ) AB AB ot (a—l(b-l 
x 


a b 
x Ye ni. (ap); 


i=! j=l 
— 
(a —1)(c—-1) 


x » > nj a(ay)?, 


Interaction (a — 1)(c — 1) SSac MSac of? + 
AxC 


i 
. 
Il 


Interaction 
b—1\(c-1 SS MS ; + 7 
BxC ( (ce — 1) BC BC os + b-ite-) 
b c 
x Yn (By) 
j=l k=] 
l 
Interacti —1\(b—1\(c—1 SS MS . 
nteraction (a — 1) Yc — 1) ABC ABC oe + (a —1)(b—1)(c—-1) 
AxBxC 


a boc 
x VY DY nije OBy iin 


i=1 j=l k=I 
Error N —abc SSE MSE o2 


is given in Table 5.9. Under the assumption of normality, the test procedures 
are performed as in the case of corresponding balanced analysis. The expected 
mean squares for the random and mixed effects in the general case of dispropor- 
tional frequencies are extremely involved and the interested reader is referred 
to Blischke (1966) for further information and details. 

The model for the r-way crossed classification remains the same as in 
(5.12.1), except that the sample size corresponding to (ij, i2,...,i,)-th cell 
will now be denoted by n;,;,..;,. The analysis when the unequal sample sizes 
follow a proportional pattern follows readily on the lines of three-way crossed 
classification outlined above. For further information and details, the reader is 
referred to Blischke (1968). 


5.14 WORKED EXAMPLE FOR MODEL I 


Anderson and Bancroft (1952, p. 291) reported data from an experiment de- 
signed to study the effect of electrolytic chromium plate as a source for the 
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chromium impregnation of low-carbon steel wire. The experiment involved 18 
treatments obtained as a combination of three diffusion temperatures (2200°F, 
2350°F, 2500°F), three diffusion times (4, 8, and 12 hours), and two degrasing 
treatments (yes and no). Each treatment was applied on four wires giving a total 
of 72 determinations on average resistivities (in m-ohms/cm?) which was the 
variable being studied. The data are given in Table 5.10. 

The data in Table 5.10 can be regarded as a three-way classification with 
four observations per cell. Note that all three factors, temperature, time, and 
degrasing, should be regarded as fixed effects since the interest is directed only 
to the levels of the factors included in the experiment. The mathematical model 
for the experiment would be 


i 
Vijke = M+ a; + By + V+ (OB) +(OY)iK +BY) J 
+ (@By Dijk + Cijne k= 
£ 


where  z is the general mean, a; 1s the effect of the i-th level of temperature, 8; 
is the effect of the j-th level of time, jy is the effect of the k-th level of degrasing, 
(a@B);; 1s the interaction between the i-th temperature and the j-th time, (ay )jx 
is the interaction between the i-th temperature and k-th degrasing, (By) jx 1S 
the interaction between the i-th time and the k-th degrasing, (wBy)j;x 1s the 
interaction between the i-th temperature, the j-th time and the k-th degrasing, 
and é;jx¢ is the customary error term. Furthermore, it is assumed that the a;’s, 
B'S, Ve’S, (@B)ij’S, (@Y ix’, (BY) jx’S, and (@By);;,’s are constants subject to 
the restrictions: 


3 3 2 
Yo a; =) B= > nx =0, 


i=l j=l k=1 


3 3 3 2 
Y> @B)i; = > @B); = > @v)ix = >> @Y iz 
i=] j=l i=l k=1 


3 2 
=) > (By) jx = >> (BY) jx = 9, 
j=l 


k=1 


3 3 2 
SY) @BY ijk = > (OBY)i jx = S> OBY diz = 0, 
i=l j=l k=1 


and the e;;x¢’s are independently and normally distributed with mean zero and 
variance o?. 
The first step in the analysis of variance computations is to form a three-way 


table of cell totals containing yj; jx. = a yijxe (see Table 5.11). The next step 
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TABLE 5.11 
Cell Totals Vijk. 
Temperature 
2200°F 2350°F 2500°F 
Time 4hrs Shrs 12hrs 4hrs 8hrs 12hrs 4hrs 8hrs_ 12 hrs 


Degrasing Yes 73.6 77.8 80.8 840 90.4 93.2 91.1 107.0 105.9 
No 74.7 79.6 80.8 86.2 90.2 944 92.9 1048 106.0 


TABLE 5.12 
Sums over Levels of Degrasing yj. 
Time 
Temperature (°F) 4hrs S8hrs  12hrs Yj... 
2200 148.3 157.4 161.6 467.3 
2350 170.2 180.6 187.6 538.4 
2500 184.0 211.8 211.9 607.7 
Yj. 502.5 549.8 561.1 Von. 
1,613.4 
TABLE 5.13 
Sums over Levels of Time y;x. 
Degrasing 
Temperature (°F) Yes No yj. 
2200 232.2 235.1 467.3 
2350 267.6 270.8 538.4 
2500 304.0 303.7 607.7 
Yk 803.8 809.6 Y.... 
1,613.4 


consists of forming sums over every index and every combination of indices. 
Thus, we sum over the levels of degrasing to get a temperature (7) x time (/) 
table containing yj;.. = yar yijx. (see Table 5.12). This table is then summed 
over the levels of time to obtain y;. and over the levels of temperature to give 
yj... The sum of y;._ is equal to the sum of y.;.. which 1s the grand total y... 
Tables 5.13 and 5.14 are obtained similarly. 
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TABLE 5.14 
Sums over Levels of Temperature y.jx. 
Degrasing 
Time (hrs) Yes No Yj. 
4 248.7 253.8 502.5 
8 275.2 274.6 549.8 
12 279.9 281.2 561.1 
YR 803.8 809.6 y 


With these preliminary results, the subsequent analysis of variance calcu- 
lations are fairly straightforward. Thus, from the computational formulae of 
Section 5.7, we have 


1,613.4)? 
SS; = (17.9)? + (18.0)* +--+ (26.9)? — _ 6b)" 
3x3x2x4 
= 36,677.86 — 36, 153.6050 
= 524.2550, 
1 (1,613.4)* 
SS, = ———-{(467.3)* + (538.4) 07.7)7} — —————_ 
A= 33 x Gg O73)" + ( y+ (007-1) — Ta x4 
= 36,564.2975 — 36,153.6050 
— 410.6925, 
] (1,613.4) 
SSp = ———((502.5)* + (549.8)* + (561.1)7} — —————— 
8 xara! y+ y+ y} 3x3x2x4 
= 36,234.1458 — 36,153.6050 
= 80.5408, 
1 (1,613.4)? 
SSc = ——— |(803.8)* + (809.6)*} — —————_ 
C= 3x3xal! y+ I~ 33K ax4 
= 36,154.0722 — 36,153.6050 
= 0.4672, 


1 
SSap = 5g (148.3 + (157.4)? +--+» +(211.9)*} 


(1,613.4)? 
3x3x2x4 
= 36,659.6525 — 36,153.6050 — 410.6925 — 80.5408 


= 14.8142, 


— SS, — SSp 
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] 
—— {(232.2)* + (235.1)? + --- + (303.7)"} 


SSac = 
AC 3x4 
(1,613.4) 5s, _ §s 
3x3x2x4 A C 
= 36,565.0783 — 36,153.6050 — 410.6925 — 0.4672 
= 0.3136, 
1 
SSec = Tg (248. + (253.8)? +--+ + (281.2)*} 
x 
(1,613.4) 
—_ ——___~_ _ §S, — SS 
3x3x2x4 8 c 
— 36,235.3150 — 36,153.6050 — 80.5408 — 0.4672 
= 0.7020, 
and 
1 (1,613.4) 
SS — —{(73.6)* + (74.7) +--+ (106.0)7} — ————_—_ 
ABC a \¢ + ( y+: +( 3 ax 3K2x4 


—SSasp —SSac — SSac — SSa — SSp — SSc 

= 36,662.01 — 36,153.6050 — 14.8142 — 0.3136 — 0.7020 
— 410.6925 — 80.5408 — 0.4672 

= 0.8747. 


Finally, by subtraction, the error sum of squares 1s given by 


SSe = 524.2550 — 410.6925 — 80.5408 — 0.4672 — 14.8142 
— 0.3136 — 0.7020 — 0.8747 
= 15.8500. 


These results along with the remaining calculations are entered in Table 5.15. 

From Table 5.15 it is evident that the variance ratio for the three-factor inter- 
action (i.e., temperature x time x degrasing) is less than one indicating a non- 
significant effect (p = 0.566). The variance ratios for temperature x degrasing 
as well as time x degrasing interactions are also too low to achieve any sig- 
nificance. However, the variance ratio for temperature x time interactions is 
quite large and highly significant (p <0.001). In terms of the main effects, 
the variance ratio for the degrasing effect is relatively small and nonsignifi- 
cant (p = 0.213). However, the variance ratios for temperature and time main 
effects are extremely large and highly significant (p < 0.001). Hence, we may 


The Analysis of Variance 


320 


99¢°0 cL0 
OIc¢ 0 0c'l 
68°°0 tS'0 


100°0> c9'CI 


C170 6S] 


100°0> Oc LEI 


100°0> 09°669 


anjea-d = anjea 4 


Ae Wi T AG T “li l 


I=yI=f1=1 _ _ 
YA go) < <% x (I — 21 : EI — €) 4 
ff 


O 

I= OU-O) |, 
pxe ¢ 
UA OU=9 | 20 
ypxe ¢ 
I= OI=O9 , a, 
yxZ ¢ 


aaenbs ueaw 
pa}dedx3 


St6c 0 


L817 0 


O1Se 0 


89ST 0 


StOLl'e 


CLOV 0 


vOL? OF 


COPE SOT 


aaenbs 
ueaw 


OSSc Hes 
00S8'SI 


LvLs 0 


0c0L 0 


9C1¢ 0 


CVIS HI 


cLOV 0 


80S 08 


$c69 OIP 


sauenbs 
jo wing 


JO saaisaq 


[BIOL 


JOY 


SUISBISOG X OUI] X sINjeIodwoy, 


SuIseIsoq X OUI], 


SuIseisaq xX ainjelodwiay, 


su], x anjerodwiay, 


suIseisoqd 


OUI 


oinjeradwidy, 


UON}ELIA 
jo a01Nn0S 


OL'S a]qey so eyEG APANSISaY BY} 10J DDULLILA Jo SISAjeUY 


SL°S FIaVL 


Three-Way and Higher-Order Crossed Classifications 321 


reach the following conclusions: 


(i) There are no three-factor (i.e., temperature x time x degrasing) inter- 
actions. 

(ii) There are no two-factor interactions between degrasing and either of the 
other two factors — temperature and time. However, there are significant 
interactions between temperature and time. 

(iii) There are two main effects, that is, due to temperature and time. There 
are no main effects for degrasing. 


In view of the presence of significant temperature x time interactions, the tests 
for temperature and time main effects are not particularly meaningful. The 
researcher should first investigate the nature of temperature x time interactions 
before determining whether main effects are of any practical interest. 

To study the nature of the temperature x time interactions, suppose the 
researcher wished to estimate separately, the differences in average resistiv- 
ities for three diffusion temperatures for three diffusion times. The contrasts of 
interest are: 


LXj=pan.-pen., Lo=e31.- fa. L3 = 431.- Lin, 
Lg=en. - bi, bs =p32.-en, Lo = 432.- 12; 
L7 = p3.-13., Lg = 433.—- 13., Lo = 33. — 113.- 


The preceding contrasts are estimated as 


a 170.2 148.3 A 184.0 170.2 
Ly = — - — =2.74, Lo, = — - — = 1.73, 
8 8 8 8 
a 184.0 148.3 A 180.6 157.4 
L3 = —— -— — =446, Ly = — —- — = 2.90, 
8 8 8 8 
a 211.8 180.6 A 211.8 157.4 
Ls = —— — — =3.90, Le = —— —- — = 6.80, 
8 8 8 8 
a 187.6 161.6 a 211.9 187.6 
Ly = —— — —— =3.25, Lg = —— —- — = 3.04, 
8 8 8 8 
a 211.9 161.6 
Lo = —— —- — = 6.29. 
8 8 
The estimated variances are obtained as 
Var(L,) = Var(L2) = --: = Var(Lo) = - [(d)° + (-1)°] 
0.2935 
= (2) = 0.073. 
4x2 


The desired 95 percent Bonferroni intervals for the contrasts of interest are 
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determined as 
iL, +1[54, 1 — 0.05/18],/ Var(Z;) = 2; £2.89 x 0.270, i=1,...,9; 
that 1s, 


1.96 = 2.74 — 2.89 x 0.270 < poy. — Wy. < 2.74 + 2.89 x 0.270 = 3.52 
0.95 = 1.73 — 2.89 x 0.270 < m3. — fa. < 1.73 + 2.89 x 0.270 = 2.51 
3.68 = 4.46 — 2.89 x 0.270 < 3). — Wy, < 4.46 + 2.89 x 0.270 = 5.24 
2.12 = 2.90 — 2.89 x 0.270 < jaro — p12. < 2.90 + 2.89 x 0.270 = 3.68 
3.12 = 3.90 — 2.89 x 0.270 < p32. — fr. < 3.90 + 2.89 x 0.270 = 4.68 
6.02 = 6.80 — 2.89 x 0.270 < ju3o. — 12, < 6.80 + 2.89 x 0.270 = 7.58 
2.47 = 3.25 — 2.89 x 0.270 < pars, — f13, < 3.25 + 2.89 x 0.270 = 4.03 
2.26 = 3.04 — 2.89 x 0.270 < ju33, — 23, < 3.04 + 2.89 x 0.270 = 3.82 
5.51 = 6.29 — 2.89 x 0.270 < 33. — 1143, < 6.29 + 2.89 x 0.270 = 7.07. 


The average resistivities for different combinations of diffusion temperatures 
and times indicate that average resistivity increases when going from lower to 
higher temperature levels. However, the increases are greater as one moves from 
lower to higher levels of diffusion time. The different influence of diffusion 
temperature, which depends on the diffusion time, implies that the temperature 
and time factors interact in their effect on resistivity. In view of the important 
interaction effects between temperature and time on average resistivities in the 
study findings, the researcher may decide that main effects due to temperature 
and time are not meaningful or of practical importance. 


5.15 WORKED EXAMPLE FOR MODEL II 


Johnson and Leone (1977, p. 861) reported data from an experiment designed to 
study the melting point of ahomogeneous sample of hydroquinone. The experi- 
ment was performed with three analysts using three uncalibrated thermometers 
and working in three separate weeks. The data are given in Table 5.16. 

The data in Table 5.16 can be regarded as a three-way classification with one 
observation per cell and all three factors can be regarded as random effects. The 
mathematical model for this experiment would be 


i=1,2,3 
Vijk = UW + aj + By + V+ (OB) + OV )ik + BY) je + eije ¥ J=1,2,3 
k=1,2,3 


3 


3 


where jz is the general mean, a; is the effect of the i-th thermometer, 6; is 
the effect of the j-th week, y is the effect of the k-th analyst, (@B),; is the 
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TABLE 5.16 
Data on the Melting Points of a Homogeneous Sample of Hydroquinone 
Thermometer 
1 2 3 
Week 1 2 3 1 2 3 1 2 3 


Analyst 1 1740 1735 1745 173.0 173.5 173.0 171.5 172.5 173.0 
2 173.00 173.0 1735 172.00 173.0 173.5 171.0 172.0 171.5 
3 1735 173.0 173.0 173.0 173.5 172.55 173.0 173.0 172.5 


Source: Johnson and Leone (1977, p. 861). Used with permission. 


TABLE 5.17 
Sums over Analysts yj. 
Week (j) 

Thermometer (i) 1 2 3 Yj. 
1 520.5 519.5 521.0 1,561.0 
2 518.0 520.0 519.0 1,557.0 
3 515.5 517.5 517.0 1,550.0 
Yj. 1,554.0 1,557.0 1,557.0 y 


4,668.0 


interaction of the z-th thermometer with the j-th week, (ay )j;, is the interaction 
of the i-th thermometer with the k-th analyst, (By);, is the interaction of the 
j-th week with the k-th analyst, and e;;, 1s the customary error term. In order 
to estimate the error variance, it is assumed that thermometer x week x analyst 
interaction is zero. Furthermore, it is assumed that the a;’s, Bj’s, y's, (@B)j;;’S, 
(ay )ix’S, (By) jx’S, and e;;,’S are independently and normally distributed with 
zero means and variances 02, OR, oy, ap? oj, Op and a2, respectively. 

The first step in the analysis of variance computations consists of forming 
sums over every index and every combination of indices. Thus, we sum over 
analysts (k) to obtain a thermometer (i) x week (j) table containing y;j, = 
yar yijk (See Table 5.17). This table is then summed over the levels of week 
(j) to obtain the thermometer totals ();..) and again over thermometers (i) to 
obtain week totals (y.;.). Now, the sum of the thermometer totals is equal to the 
sum of the week totals, which gives the grand total (y..). Finally, Tables 5.18 
and 5.19 are obtained in a similar way. 

With these preliminary results, the subsequent analysis of variance calcu- 
lations are rather straightforward. Thus, from the computational formulae of 
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TABLE 5.18 


Sums Over Weeks y;.x 
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Thermometer (/) 


Analyst (k) 1 
1 522.0 
2 519.5 
3 519.5 
Yj. 1,561.0 
TABLE 5.19 


Sums Over Thermometer y jx 


Analyst (k) 1 
T 518.5 
2 516.0 
3 519.5 
Yj. 1,554.0 


Section 5.10, we have 


SSr = (174.0)? + (173.0)? + --- + (172.5)? — 


= 807,061 — 807,045.333 = 15.667, 
(1,561.0)? + (1,557.0)? + (1,550.0)? (4,668.0) 


3 Y..k 


517.0 1,558.5 
514.5 1,552.5 
518.5 1,557.0 


1,550.0 y. 


1,557.0 Y... 
4,668.0 
(4,668.0) 


3x3x3 


SS, = vee 
3x3 3x3x3 
= 807,052.222 — 807,045.333 = 6.889, 
ss, = (1,554.0)? + (1,557.0)* + (1,557.0) _ (4,668.0)? 
3x3 3x3x3 
= 807,046.000 — 807,045.333 = 0.667, 
SS. = (1,558.5)? + (1,552.5)? + (1,557.0) - (4,668.0) 
3x 3 3x3x3 
= 807,047.500 — 807,045.333 = 2.167, 
2 2 2 2 
SS45 = (520.5)* + (519.5)° +---+(517.0) — (4,668.0)° $5, — $8, 


= 807,054.000 — 807,045.333 — 6.889 — 0.667 


= 1.111, 
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SS — oo eee Nee Ee 884 — SSe 
3 3x3x3 
= 807,056.500 — 807,045.333 — 6.889 — 2.167 
= 2.111, 
and 
518.5)* + (519.5)? 518.0)? (4,668.0)? 
SSac = O13)" FOO) Fe FOO 4008.0)" og sso 
3 3x3x3 
= 807,049.830 — 807,045.333 — 0.667 — 2.167 


= 1.667. 


Finally, by subtraction, the error (three-way interaction) sum of squares 1s given 
by 


SS- = 15.667 — 6.889 — 0.667 — 2.167 — 1.111 — 2.111 — 1.667 
= 1.055. 


These results along with the remaining calculations are shown in Table 5.20. 
From Table 5.20 it is clear that the hypotheses on the two-factor interactions, 
namely, 


Hy” Oop =0 versus Hy”: Oop > 0, 


AC. .2 _ AC . 
Hj :0y, =9 versus H; on, > 0, 


and 


BC. 2 
HE of, =0 versus H; :og, > 0, 


are all tested against the error (thermometer x week x analyst interaction) term. 
On examining the last two columns of Table 5.20, it is seen that only the 
thermometer x analyst interaction is significant (p = 0.045). 

Now, we note that there are no exact F tests for testing the main effects 
hypotheses, namely, 


Hj:02=0 versus Hj‘':o2>0, 


Ho :0; =0 versus HP a; > 0, 


and 


Hs [Oy =0 versus Hy [0 > 0, 
unless we are willing to assume that certain two-factor interactions are zero. 
Since we have just concluded that some two-factor interactions are insignificant, 
we can obtain exact tests for certain hypotheses by assuming the corresponding 
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two-factor interactions to be zero. But, if we are unwilling to assume that 
any two-factor interactions are zero, then as we know there are no exact F tests 
available for this problem. We can, however, obtain pseudo- F tests as discussed 
in Section 5.5. In the following we illustrate the pseudo-F test for testing the 
hypothesis H/':02 = 0 versus Hf: 02 > 0. 


Qa 


From Table 5.20, we obtain 


E(MSaz) + E(MSac) — E(MSz) = 0; + 3034 + 303, 


which is precisely equal to E(MS,) when o2 = 0. Hence, the desired test 
statistic 1s 


MS, 3.444 


= —  — —___—_ = § 1], 
MSap + MSac — MSe 0.278 + 0.528 — 0.132 


Fy 


which has an approximate F distribution with 2 and v, degrees of freedom, 
where v, (rounded to the nearest digit) 1s approximated by 


(0.278 + 0.528 — 0.132)? 


~ (0.278) . (0.528) , (-0.132 
er a rr rs 


a 


For a level of significance of 0.05, we find from Appendix Table V that 
F[2,5;0.95] =5.79. Since Fy =5.11 <5.79, we do not reject H¢ and con- 
clude that thermometers do not have a significant effect on melting point 
(p = 0.062). Similarly, approximate F tests can be readily performed for 
HH? and He yielding Fz = 0.59, Fé = 1.33, v» =5, and v, = 6. The resulting 
p-values for F; and Fé are 0.589 and 0.333, respectively, and we may conclude 
that weeks and analysts also do not have any significant effect on the melting 
point. 

Finally, we can also obtain the unbiased estimates of the variance compo- 
nents, giving 


6° = 0.132, 
.y 0.278 — 0.132 
6/5 = —————_ = 0.099, 
3 
0.528 — 0.132 
éj, = ———— = 0.132, 
0.417 — 0.132 
65, = —.—— = 0.095, 
3.444 — 0.278 — 0.528 + 0.132 
62 = SAREE 0278 = 0028 4 018? = 0.308, 
3x3 
0.333 — 0.278 — 0.417 + 0.132 
Ge U9 22 — NETO TUNE TONS = —().026, 


°B 3x3 
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TABLE 5.21 
Data on Diamond Pyramid Hardness 
Number of Dental Fillings Made from Two 


Alloys of Gold 
Alloy 
Gold Foil Goldent 
Method 1 2 3 1 2 3 
Dentist 792 772 ~=—-782 824 772 803 


1 

2 803. 752 = 715 803. 772 ~~ =707 
3 715 792 = 762 724 715 606 
4 673 657 690 946 743 245 
5 634 649 724 715) 724 = = 627 


Source: Halvorsen (1991, p. 145). Used with permission. 


and 


1.083 — 0.528 — 0.417 + 0.132 
67 = BOGS T B0E6 TAT Te = (2.030. 
y 3x3 


The results on variance components estimates are consistent with those on 
tests of hypotheses. It should, however, be noted that although certain variance 
components seem to be relatively large, a rather small value for the error degrees 
of freedom makes the corresponding F test quite insensitive. 


5.16 WORKED EXAMPLE FOR MODEL III 


The following example is based on an experiment in dentistry designed to 
study the hardness of gold filling material. The data are taken from Halvorsen 
(1991, p. 145). Five dentists were asked to prepare the six types of gold fill- 
ing material, sintered at the three temperatures, and using each of the three 
methods of condensation. The data in Table 5.21 represent a subset of the orig- 
inal data involving only two alloys of gold filling material and sintered at a 
single temperature. 

The data in Table 5.21 can be regarded as a three-way classification with one 
observation per cell. Note that the alloy and method are fixed effects and the 
dentist is a random effect. The mathematical model for this experiment would 
be 


Vijk = +a; + Bj + VR + (OB); + (AY diz 
+ (BY) jx + eijk 
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TABLE 5.22 
Sums over Dentists yj. 
Method (/) 
Alloy (i) 1 2 3 Yi. 


Gold Foil 3,617. 3,622. 3,673—«-:10,912 

Goldent 4,012 3,726 +~—-2,988 ~—-10,726 

yj 7,629 7,348 6661 _ y,,, 
21,638 


where j1 is the general mean, a; is the effect of the i-th alloy, B; is the effect 
of the j-th method, is the effect of the k-th dentist, (@B);; 1s the interac- 
tion of the i-th alloy with the j-th method, (a@y)j, is the interaction of the i-th 
alloy with the k-th dentist, (By) ;, is the interaction of the j-th method with 
the k-th dentist, and e;;, is the customary error term. In order to estimate the 
error variance, it is assumed that alloy x method x dentist interaction is zero. 
Furthermore, it is assumed that the a;’s, 8;’s, and (wB);;’s are constants subject 
to the restrictions: 


2 3 
i=1 j=1 
3 


2 
> (@B)ij = >) @B)ij = 0; 
i=] 


j=l 


and y;.’s, (@y)ix’s, (BY) jx’s, and e;;,’s are independently and normally dis- 
tributed with mean zero and variances o/, OL, Op, and a7, respectively; and 
subject to the restrictions 


2 3 
YS @v ik = > BY) =9 (kK = 1,2,3,4,5). 
i=l j=! 


The first step in the analysis of variance computations consists of forming 
sums over every index and every combination of indices. Thus, we sum over 
dentists (kK) to obtain an alloy (i) x method (/) table containing y;;. = yy Vijk 
(see Table 5.22). We then sum this table over methods (j) to obtain the alloy 
totals (y;..) and again over alloys (i) to obtain method totals (y,;.). Now, the 
sum of the alloy totals is equal to the sum of the method totals, which gives the 
grand total (y...). Finally, Tables 5.23 and 5.24 are obtained in a similar way. 

With these preliminary results, the subsequent analysis of variance calcu- 
lations are rather straightforward. Thus, from the computational formulae of 
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TABLE 5.23 
Sums over Methods y; 


Alloy (i) 
Dentist (kK) Gold Foil Goldent Yk 


1 2,346 2,399 4,745 
2 2,270 2,282 4,552 
3 2,269 2,045 4,314 
4 2,020 1,934 3,954 
5 2,007 2,066 4,073 
Yj. 10,912 10,776 y... 
21,638 
TABLE 5.24 
Sums over Alloys y jx 
Method (j) 
Dentist (k) 1 2 3 Yk 
1 1616 1,544 = 1,585 4,745 
2 1606 1,524 1,422 4,552 
3 1,439 1,507 = 1,368 4,314 
4 1,619 1,400 935 3,954 
5 1,349 1,373 1,351 4,073 
yj. 7,629 7,348 6,661 Y... 
21,638 
Section 5.10, we have 
21,638) 
SS; = (792)? + (803)? +--+. + (627) — (21,098) 
2x3x5 
= 15,982,022 — 15,606,768.133 
= 375,253.867, 
Ss, = (10,912)* + (10,726)? (21,638)? 
A’ 3x5 2x3x5 
= 15,607,921.333 — 15,606,768.133 
= 1,153.200, 
5s, — (7,629)* + (7,348)? + (6,661)? (21,638)? 
a 2x5 2x3x5 


= 15,656,366.600 — 15,606,768.133 
= 49,598.467, 
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_ (4,745)° + (4,552)? + --- + (4,073)? (21,638) 


SSc _ oes 
2x3 2x3x5 
= 15,678,295 .000 — 15,606,768.133 
= 71,526.867, 
3,617)? + (3,622)? + ---+ (2,988)? 21,638)" 
SSap = (O17) + B,022)" FF 988) C1038) ogg gg, 
5 2x3x5 
= 15,719,973.200 — 15,606,768.133 — 1,153.200 — 49,598.467 
= 62,453.400, 
2,346)" + (2,270)? + --- + (2,066)" 21,638)" 
55 go (2346 + 2,270)? +--+ 2,066)? 21,6387 egg 
3 2x3x5 
= 15,688,962.667 — 15,606,768.133 — 1,153.200 — 71,526.867 
= 9,514.467, 
1,616)? + (1,606)? + ---+(1,351)* (21,638) 
55, — (1816)? + 1,606)" +--+ (1,351)? 21,638) oe 
2 2x3x5 
= 15,815,112.000 — 15,606,768.133 — 49,598.467 — 71,526.867 
= 8§7,218.533. 


Finally, by subtraction, the error (three-way interaction) sum of squares is given 
by 


SSz- = 375,253.867 — 1,153.200 — 49,598.467 — 71,526.867 — 62,453.400 


— 9,514.467 — 87,218.533 
= 93,789.933. 


These results along with the remaining calculations are shown in Table 5.25. 

From Table 5.25 it is clear that all the two-factor interactions should be 
tested against the error term. It is immediately evident that the alloy x dentist 
and method x dentist interactions give F ratio values less than one and are 
clearly nonsignificant. Also, the alloy x method has an F value of 2.66, which 
again fails to reach the 5 percent level of significance (p = 0.130). The method 
main effect, when tested against the method x dentist interaction, has an F 
value of 2.27 which is also nonsignificant (p = 0.165). The alloy main effect, 
tested against the alloy x dentist interaction, gives an F ratio of less than 1 and 
is clearly nonsignificant (p = 0.523). The dentist effect, tested against error, 
has an F ratio of 1.53 which is again nonsignificant (p = 0.282). 

One can also obtain unbiased estimates of the variance components o?, py 


oz, and oY, yielding 
6° = 11,723.617, 


Opy = 


] 
G2 = 5 (10,902,317 — 11,723.617) = —410.650, 
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(2,378.617 — 11,723.617) = —3,115.000, 


and 
6° = gil 7 881.717 — 11,723.617) = 1,026.350. 


The negative estimates are probably an indication that the corresponding vari- 
ance components may be zero. The point estimates of variance components 
are consistent with the results on tests of hypotheses. Finally, the confidence 
limits on contrasts for the fixed effects can be constructed in the usual way. For 
example, to obtain 95 percent confidence limits for the difference between the 
two alloy effects, we have 


— 10,912 10,726 


yo—-y = — —- —— = 12.400, 
Yi. ~ ¥2.. 3x5 3x5 
2 2 
Var(¥1.. — V2.) = x5 MSac = Zc 5 (2378.617) = 317.149, 


and 
t[4,0.975] = 2.776. 
So the desired confidence limits are 


12.400 + 2.776V 317.149 = (—37.037, 61.837). 


One can similarly obtain confidence limits for the difference between any two 
method effects based on the t test. However, since there may be more than one 
comparison of interest, the multiple comparison techniques should generally 
be preferred. 


5.17 USE OF STATISTICAL COMPUTING PACKAGES 


Three-way and higher-order factorial models can be analyzed using either SAS 
ANOVA or GLM procedures. For a balanced design, the recommended pro- 
cedure is ANOVA and for the unbalanced design, GLM must be used. The 
random and mixed model analyses can be handled by the use of RANDOM 
and TEST options. Approximate or pseudo- F tests can be carried out via GLM 
using the Satterthwaite procedure. For the estimation of variance components, 
PROC MIXED or VARCOMP must be used. For instructions regarding SAS 
commands see Section 11.1. 

Among the SPSS procedures, either ANOVA or MANOVA could be used for 
a fixed effects analysis although ANOVA would be simpler. For the analyses 
involving random and mixed effects models, MANOVA or GLM must be used. 
For the estimation of variance components, VARCOMP is the procedure of 
choice. For instructions regarding SPSS commands see Section 11.2. 
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The BMDP programs described in Section 4.15 are also adequate for ana- 
lyzing three-way and higher-order factorial models. No new problems arise for 
analyses involving higher-order factorials. 


5.18 WORKED EXAMPLES USING STATISTICAL PACKAGES 


In this section, we illustrate the applications of statistical packages to per- 
form three-way analysis of variance for the data sets of examples presented 
in Sections 5.14 through 5.16. Figures 5.1 through 5.3 illustrate the program 
instructions and the output results for analyzing data in Tables 5.10, 5.16, and 
5.21 using SAS ANOVA/GLM, SPSS MANOVA/GLM, and BMDP 2V/8V 


DATA ELECCHRM; The SAS System 

INPUT TEMP TIME DEGR Analysis of Variance Procedure 

RESIST; Dependent Variable: RESIST 

DATALINES; Sum of Mean 

1131#417.9 Source DF Squares Square F Value Pr > F 


ee ee . Model 17 508.40500 29.90618 101.89 0.0001 
3.3 2 26.9; Error 54 15.85000 0.29352 
PROC ANOVA; Corrected Total 71 524.25500 
CLASSES TEMP TIME DEGR; 
MODEL RESIST=TEMP TIME R-Square C.V. Root MSE RESIST Mean 
DEGR TEMP* TIME TEMP* DEGR 0.969767 2.417732 0.5418 22.408 
TIME* DEGR 
TEMP* TIME* DEGR; Source DF Anova SS Mean Square F Value 
RUN; TEMP 2 410.69250 205.34625 699. 
CLASS LEVELS VALUES } TIME 2 80.54083 40.27042 137. 
TEMP 1 2 3 | DEGR 1 0.46722 -46722 1. 
TIME 12 3 TEMP* TIME 4 14.81417 3.70354 12. 
DEGR 1 2 TEMP* DEGR 2 0.31361 0.15681 
2 0.70194 0 
4 0.87472 0 


NUMBER OF OBS. IN DATA TIME* DEGR .35097 
SET=72 TEMP* TIME* DEGR -21868 


(i) SAS application: SAS ANOVA instructions and output for the three-way fixed effects 
analysis of variance. 


DATA LIST Analysis of Variance-Design 1 
/TEMP 1 TIME 3 DEGR 5 
RESIST 7-10(1). Tests of Significance for RESIST using UNIQUE sums of squares 
BEGIN 
Source of Variation Ss 


WITHIN CELLS 
TEMP 
TIME 


DEGR 
TEMP TIME 


TEMP DEGR 

- - . TIME DEGR 

3 . TEMP TIME BY DEGR . 
END . (Model) 508. 
MANOVA RESIST BY (Total) 524. 
TEMP(1,3) TIME(1,3) 

DEGR(1,2). R-Squared 


ee 


(ii) SPSS application: SPSS MANOVA instructions and output for the three-way fixed 
effects analysis of variance. 


FIGURE 5.1 Program Instructions and Output for the Three-Way Fixed Effects 
Analysis of Variance: Data on Average Resistivities (in m-ohms/cm*) for Elec- 
trolytic Chromium Plate Example (Table 5.10). 
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/ INPUT FILE='C:\ SAHAI BMDP2V ~ ANALYSIS OF VARIANCE AND COVARIANCE WITH 
\ TEXTO\ EJE12.TXT'. REPEATED MEASURES Release: 7.0 (BMDP/DYNAMIC) 


FORMAT=FREE. 

VARIABLES=4. ANALYSIS OF VARIANCE FOR THE 1-ST DEPENDENT VARIABLE 
/VARIABLE NAMES=TE,TI,DEG,RES. |THE TRIALS ARE REPRESENTED BY THE VARIABLES:RESIST 
/GROUP VARIABLE=TE,TI, DEG. 

CODES (TE) =1,2,3. SOURCE SUM OF D.F. MEAN F TAIL 

NAMES (TE) =F1,F2,F3. SQUARES SQUARE PROB. 


36153.60507 123173. 0.0000 
205.34623 699. .0000 


CODES (TI) =1,2,3. MEAN 36153.60507 1 

NAMES (TI) =H4,H8,H12. 410.69245 2 

CODES (DEGR) =1,2. 80.54083 2 40.27041 137. -0000 

NAMES (DEGR)=YES,NO. 0.46722 1 0.46722 1. -2125 
/DESIGN DEPENDENT=RESIST. 14.81417 4 3.70354 12. .0000 
/END .31361 2 0.15681 . 5892 
1 2121#17.9 - 70194 2 0.35097 . 3104 
see . -87472 4 0.21868 . .5656 
3.3 2 26.9 85000 54 0.29352 


(iii) BMDP application: BMDP 2V instructions and output for the three-way fixed 
effects analysis of variance. 


FIGURE 5.1 (continued) 


DATA MELTPOIN; The SAS System 
INPUT THERMOM WEEK General Linear Models Procedure 
ANALYST MELTINGP; Dependent Variable: MELTINGP 
DATALINES; Sum of Mean 
111 #174. Source DF Squares Square F Value Pr> F 
173. Model 18 14.611111 0.811728 6.15 0.0066 
173. Error 8 1.055556 0.131944 
173. Corrected Total 26 15.666667 
173. R-Square C.V. Root MSE MELTINGP Mean 
173. 0.932624 0.210101 0.3632 172.89 
174. Source D Type III SS Mean Square F Value Pr > F 
173. THERMOM 6.8888889 3.4444444 26.11 0.0003 
173. WEEK 0.6666667 0.3333333 53 0.1411 
173. ANALYST 2.1666667 1.0833333 21 0.0115 
172. THERMOM* WEEK 21.1111111 0.2777778 11 0.1719 
173. THERMOM* ANALYST 2.1111111 0.5277778 .00 0.0453 
173. WEEK* ANALYST 1,6666667 0.4166667 16 0.0780 
173. Source Type III Expected Mean Square 

. THERMOM Var(Error) + 3 Var (THERMOM* ANALYST) 
172. + 3 Var (THERMOM* WEEK) + 9 Var (THERMOM) 
; WEEK Var(Error) + 3 Var(WEEK* ANALYST) 
PROC GLM; + 3 Var (THERMOM* WEEK) + 9 Var (WEEK) 
CLASSES THERMOM WEEK ANALYST Var(Error) + 3 Var (WEEK* ANALYST) 
ANALYST; + 3 Var (THERMOM* ANALYST) + 9 Var(ANALYST) 
MODEL MELTINGP=THERMOM | THERMOM* WEEK Var(Error) + 3 Var(THERMOM* WEEK) 
WEEK ANALYST THERMOM* THERMOM* ANALYST Var(Error) + 3 Var(THERMOM* ANALYST) 
WEEK THERMOM* ANALYST WEEK* ANALYST Var(Error) + 3 Var(WEEK* ANALYST) 
WEEK* ANALYST; Source: THERMOM Error:MS (THERMOM* WEEK) +MS (THERMOM* ANALYST) -MS (Err) 
RANDOM THERMOM WEEK Denominator Denominator 
ANALYST THERMOM* WEEK DF Type III MS DF MS F Value Pr > F 
THERMOM* ANALYST 2 3.4444444444 4.98 0.6736111111 5.1134 0.0621 
WEEK* ANALYST /TEST; Source: WEEK Error: MS(THERMOM* WEEK) + MS (WEEK* ANALYST) -MS (Err) 
RUN; Denominator Denominator 
CLASS LEVELS VALUES DF Type III MS DF MS F Value Pr > F 
THERMON 3 12 3 2 0.3333333333 4.88 0.5625 0.5926 0.5883 
WEEK 3 12 3 Source:ANALYST Error:MS (THERMOM* ANALYST) +MS (WEEK* ANALYST) -MS (Err) 
ANALYST 3 12 3 Denominator Denominator 
NUMBER OF OBS. IN DATA DF Type III MS DF MS F Value Pr > F 
SET=27 1.0833333333 5.73 0.8125 1.3333 0.3346 


NO MNNDN PPP RE Re ee 
NNR RP RP WWW NNDB 
Nr WHO WY WDY eR WP 
omoooconunoaonnond 


Ww 
Ww 
om 


(i) SAS application: SAS GLM instructions and output for the three-way random effects 
analysis of variance. 


FIGURE 5.2 Program Instructions and Output for the Three-Way Random Ef- 
fects Analysis of Variance: Data on the Melting Points of a Homogeneous Sample 
of Hydroquinone (Table 5.16). 
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DATA. LIST 
/THERMOM 1 
WEEK 3 
ANALYST 5 
MELTINGP 7-11 (1) 
BEGIN DATA. 
1211174. 
173. 
173, 
173. 
173. 
173. 
174. 
173. 
173. 
173. 
172. 
173. 
173. 


MELTINGP 
Sig. 
-062 


Dependent Variable: 
Mean Square F 
3.444 5.113 
-674 (a) 
.333 
-562(b) 
.083 
-812(c) 
278 
- 132 (d) 
-528 
.132(d) 


Tests of Between-Subjects Effects 
Source Type III SS 
THERMOM 6.889 
3.355 
667 
2.744 
2.167 
4.655 
1.111 
1.056 
2.111 
1.056 
1.667 417 
1.056 -132(d) 
b MS(T*W)+MS (W*A)-1.000 MS(E) 


Hypothesis 
Error 
Hypothesis 
Error 
Hypothesis 
Error 
Hypothesis 
Error 
Hypothesis 
Error 


WEEK -593 


ANALYST 


THERMOM* 
WEEK 
THERMOM* 
ANALYST 

WEEK* Hypothesis 
ANALYST Error 

a MS (T*W)+MS (T* A) -MS (E) 
-MS(E) d MS(E) 


-158 .078 


MS (T* A) +MS (W* A) 


Expected Mean Squares (a,b) 
Variance Component 
Var(W) Var(A) Var (T*W) 


MNMoooonrNnoonnoanad 


1 
1 
2 
2 
2 
3 
3 
3 
1 
1 
1 
2 


MPWNYrR WN WY wr 


Var(T*A) Var(WtA) Var(E) 
9,000 9.000 3.000 3.000 3.000 1.000 
.000 -000 3.000 3.000 .000 .000 
9.000 -000 3.000 000 3.000 .000 
.000 9.000 .000 3.000 3.000 .000 
.000 .000 3.000 .000 -000 .000 
-000 .000 .000 3.000 000 .000 
.000 .000 000 .000 3.000 .000 
.000 .000 .000 .000 -000 .000 
a For each source, the expected mean square equals the sum of the 
coefficients in the cells times the variance components, plus a 
quadratic term involving effects in the Quadratic Term cell. b Expected 
Mean Squares are based on the Type III Sums of Squares. 


Var (T) 

9.000 

9.000 
.000 
.000 
.000 

- 000 
.000 
.000 


source 
Intercept 
THERMOM 

WEEK 

ANALYST 
THERMOM* WEEK 
THERMOM* ANALYST 
WEEK* ANALYST 
Error 


(oa) 


3.3 172. 
END DATA. 
GLM MELTINGP BY 
THERMOM WEEK 
ANALYST 
/DESIGN THERMOM 

WEEK ANALYST 
THERMOM* WEEK 
THERMOM* ANALYST 
WEEK* ANALYST 
/RANDOM THERMOM 
WEEK ANALYST. 


(ii) SPSS application: SPSS GLM instructions and output for the three-way random 
effects analysis of variance. 


FILE='C:\ SAHAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\TEXTO\ EJE13.TXT’. - EQUAL CELL SIZ=ZS Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 
VARIABLES=3. SOURCE ERROR SUM OF MEAN F 
NAMES=A1,A2,A3. TERM SQUARES SQUARE 


/INPUT 


/VARIABLE 


/ DESIGN NAMES=T,W,A. 
LEVELS=3, 3,3. 


RANDOM=T,W,A. 


MODEL='T, W, A’. 


/END 
174. 
173. 


173. 
173. 


173. 
173. 


MEAN 
THERMOM 
WEEK 
ANALYST 
TW 

TA 

WA 


-0704533E+5 
-88888E9E+0 
- 6666667E+0 
.1666667E+0 
-1111111E+0 
~1111111E+0 
- 6666667E+0 


-0704533E+5 
-4444444E+0 
.3333333E+0 
-0833333E+0 
-2777778E+0 
-5277778E+0 
-4166667E+0 


2.11 
4.00 
3.16 


0.1719 
0.0453 
0.0780 


Orn n fF WN eB 
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Or PB PNMNN & 
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173. TWA .0555556E+0 .1319444E+0 
173. 
173. 
172. 
173. 


173. 


174. 
173. 
173. 
173. 
171. 


0 
0 
173.5 
172.0 
173.0 SOURCE EXPECTED MEAN 
173.5 SQUARE 
0 27(1)+9(2)+9(3)+9(4)+3(5)+3(6)+3(7)+8 29890. 
0 9(2)+3(5)+3(6)+(8) 0. 
5 9(3)+3(5)+3(7) + (8) -0.02546 
F 9 (4) +3(6)+3(7) + (8) 0.03009 
3(5)+ (8) .04861 
3(6)+(8) -13194 
3(7) + (8) -09491 


(8) 13194 


ESTIMATES OF 
VARIANCE COMPONENTS 
42824 
30787 


MEAN 
THERMOM 
WEEK 
ANALYST 
TW 

TA 

WA 


171. 
172. 172. 
173.0 171. 172. 
ANALYSIS OF VARIANCE DESIGN 
INDEX T 

NUMBER OF LEVELS 3 
POPULATION SIZE INF INF INF 
MODEL T, W, A 


MOnonomnmn ad 
9OoOoMmNnNoOoom 


Oo 
mn 
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(iii) BMDP application: BMDP 8V instructions and output for the three-way random 
effects analysis of variance. 


FIGURE 5.2 (continued) 


procedures. The typical output provides the data format listed at the top, all 
cell means, and the entries of the analysis of variance table. It should be no- 
ticed that in each case the results are the same as those provided using manual 
computations in Sections 5.14 through 5.16. However, note that certain tests of 
significance in a mixed model may differ from one program to the other since 
they make different model assumptions. 
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DATA DIAMOND; 

INPUT ALLOY METHOD 
DENTIST NUMBER; 
DATALINES; 

111 792 

112 803 

23 5 627 

; 

PROC GLM; 

CLASSES ALLOY METHOD 
DENTIST; 

MODEL NUMBER=ALLOY 
METHOD DENTIST 
ALLOY* METHOD 

ALLOY* DENTIST 
METHO D* DENTIST; 
RANDOM DENTIST 
ALLOY* DENTIST 
METHOD* DENTIST; 

TEST H=ALLOY 
E=ALLOY* DENTIST; 
TEST H=METHOD 
E=METHOD* DENTIST; 
RUN; 
CLASS 
ALLOY 2 
METHOD 3 
DENTIST 5 
NUMBER OF O38S. 
SET=30 


LEVELS VALUES 
1 2 
123 


12345 
IN DATA 


337 


The SAS System 
General Linear Models Procedure 
Dependent Variable: NUMBER 
Sum of 
Squares 
281464.93333 
93788 .93333 
375253.86667 
R-Square C.V. Root MSE 
0.750065 15.011875 108.27565 
Source DF Type III SS Mean Square F Value Pr > F 
ALLOY 1 1153.200 1153.200 - 0.7618 
METHOD 49598.467 24799.233 1830 
DENTIST 71526.867 17881.717 -2829 
ALLO Y* METHOD 62453.400 31226.700 .1298 
ALLOY* DENTIST 9514.467 2378.617 -9297 
METHOD* DENTIST 8 87218.533 10902.317 -5396 
Source Type III Expected Mean Square 
ALLOY Var (Error)+3 Var (ALLOY* DENTIST) +Q(ALLOY, ALLOY* METHOD) 
METHOD Var (Error)+2 Var (METHOD* DENTIST) +Q (METHOD, ALLO Y* METHOD) 
DENTIST Var(Error)+2 Var (METHOD* DENTIST)+3 Var (ALLOY* DENTIST) 
+ 6 Var (DENTIST) 
Var(Error) + Q(ALLOY* METHOD) 
ALLO Y* DENTIST Var(Error) + 3 Var(ALLOY* DENTIST) 
METHOD* DENTIST Var(Error) + 2 Var(METHOD* DENTIST) 
Tests of Hypotheses using the Type III MS for ALLOY*DENTIST as 
an error term 
Source DF 
ALLOY 1 
Source DF 
METHOD 2 


Mean 

Square 
13403.09206 
11723.61667 


Pr > F 
0.4473 


F Value 
1.14 


Source DF 
Model 21 
Error 8 
Corrected Total 29 
NUMBER Mean 
721.26666667 


2 
4 
2 
4 


ALLO Y* METHOD 


Type III SS Mean Square F Value Pr > F 
1153.200 1153.200 0.48 0.5246 
Type III SS Mean Square F Value Pr > F 

49598.467 24799.233 2.27 0.1651 


(i) SAS application: SAS GLM instructions and output for the three-way mixed effects 


analysis of variance. 


DATA LIST 
/ALLOY 1 
METHOD 3 
DENTIST 5 
NUMBER 7-9. 
BEGIN DATA. 
11 792. 


Tests of Between-Subjects Effects 


Source 
ALLOY 


METHOD 


Dependent Variable: NUMBER 
Type III Ss df 
1153.200 1 
9514.467 
49598 .467 2 
87218 .533 


Sig 
-525 


Mean Square F 
1153.200 -485 
2378.617 (a) 

24799.233 2. 
10902 .317 (b) 


Hypothesis 
Error 
Hypothesis 
Error 


275 


NP PPP RP PRP RPP PPP 


2 


WW WN MMM DN FR Fe 


3 
END 


MN rR On &® WDNR Oe WW MY 


5 


803. 
715. 
673. 
634. 
772. 
752. 
792. 
657. 
649. 
782. 
715. 
762. 


627. 


DATA. 


NUMBER BY 
ALLOY METHOD 


DENTIST 


oooooc;cooo0o0o0oao 


[o) 


DENTIST Hypothesis 
Error 
Hypothesis 
Error 
Hypothesis 
DENTIST Error 
METHOD* Hypothesis 
DENTIST Error 
a MS(A*D) b MS(M*D) 


ALLO Y* 
METHOD 
ALLO Y* 


Source 

ALLOY 

METHOD 

DENTIST 
ALLO Y* METHCD 
ALLOY* DENTIST 
METHOD* DENTIST 


-000 
.000 
6.000 
000 
.000 
000 


Var (D) 


71526.867 
112.902 
62453.400 
93788.933 
9514.467 
93788.933 
87218.533 
93788 .933 


7.250E-02 


c MS (A*D)+MS (M* D) -MS (E) 


17881. 
1557. 
31226. 
11723. 
2378. 
11723. 
10902. 
11723. 
d MS(E) 


Expected Mean Squares (a,b) 
Variance Component 


Var (A* D} 
3.000 

- 000 
3.9000 

. 000 
3.000 

. 000 2.000 


Var (M* D) 
.000 

2.000 

2.000 
.000 
.000 


Var (Error) 


.000 
000 
-900 
000 
.000 
.000 


Vi7 
317 (c) 
700 
617 (d) 
617 
617 (d) 
317 
617 (d) 


11.4 


2. 


Quadratic Term 


Alloy 
Method 


Alloy* Method 


/DESIGN ALLOY 
METHOD DENTIST 
ALLO Y* METHOD 
ALLOY* DENTIST 
METHOD* DENTIST 

/RANDOM DENTIST. 


Error 
a For 
coefficients 
term 


.000 .000 .000 .000 
source, the expected mean square equals 
in the cells times the variance components, 
effects in the Quadratic Term cell. 


the sum of the 
plus a quadratic 
b Expected Mean 


each 


involving 


Squares are based on the Type III Sums of Squares. 


(ii) SPSS application: SPSS GLM instructions and output for the three-way mixed 
effects analysis of variance. 


FIGURE 5.3. Program Instructions and Output for the Three-Way Mixed Effects 
Analysis of Variance: Data on Diamond Pyramid Hardness Number of Dental 
Fillings Made from Two Alloys of Gold (Table 5.21). 
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/ INPUT FILE='C:\ SAHAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\TEXTO\ EJE14.TXT'. - EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 
VARIABLES=5. SOURCE ERROR SUM OF D.F. MEAN 

/VARIABLE NAMES=D1,D2,D3, TERM SQUARES SQUARE 

D4,D5. MEAN DENTIST 15606768. 15606768. 

/DESIGN NAMES=ALLOY, METHOD, ALLOY AD 1153. 

DENTIST. METHOD MD 24799. 

LEVELS=2,3,5. DENTIST 17881. 
RANDOM=DENTIST. 31226. 
FIXED=ALLOY,METHOD. 2378. 

MODEL='A, M, D'. 10902. 

11723. 


AMD 


DTIDUNBWNPE 
woununs. OWN PF 
Oo os yp) ®NP LP 


803 715 673 634 

752 792 657 649 SOURCE EXPECTED MEAN ESTIMATES OF 

715 762 690 724 SQUARE VARIANCE COMPONENTS 

803 724 946 715 MEAN 30(1)+6(4) 519629.54722 

772 715 743 724 ALLOY 15 (2)+3(6) -81.69444 

707 606 245 627 METHOD 10 (3)+2(7) 1389.69167 
ANALYSIS OF VARIANCE DESIGN DENTIST 6(4) 2980.28611 
INDEX | A M OD 5(5)+(8) 3900.61667 
NUMBER OF LEVELS 2 3 5 3 (6) 792.87222 
POPULATION SIZE 2 3 INF 2(7) 5451.15833 
MODEL A, M, D (8) 11723.61667 


Own OS WNP 


(iii) BMDP application: BMDP 8V instructions and output for the three-way mixed 
effects analysis of variance. 


FIGURE 5.3 (continued) 


EXERCISES 


1. A study was performed on 18 oxides and 18 hydroxides to compare 
the effect of corrosion on three metal products. Six samples from 
each oxide and each hydroxide of each metal product were assigned 
at random into six groups of three each. Three groups from each oxide 
and hydroxide had density measurements taken after a test period of 
30 hours and the other three groups were measured after a test period 
of 60 hours. The relevant data are given as follows. 


Metal Products 


Steel Copper Zinc 

Corrosive Corrosive Corrosive 

. Element Element Element 
Test Period —_—___— —_— —_—___ 
(hrs) O2 OH O2 OH Or OH 
30 159 =: 134 168 143 139 122 


135. 154 139-145 134 127 
125 148 167 169 165 117 
60 — 152 ~=«:120 112 = 152 126 113 
142 = 163 152.0 127 151 146 
124 150 117) s151 138 126 


(a) Describe the mathematical model and the assumptions for the 
experiment. 
(b) Analyze the data and report the analysis of variance table. 
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(c) Test whether there are differences in the effect of corrosion 
among the three metal products. Use a = 0.05. 

(d) Test whether there are differences in the effect of corrosion be- 
tween the two test periods. Use a = 0.05. 

(e) Test whether there are differences in the effect of corrosion be- 
tween oxides and hydroxides. Use a = 0.05. 

(f) Test the significance of different interaction effects. Usea = 
0.05. 

2. The percentage of silicon carbide (SiC) concentration in an aluminum- 
silicon carbide (Al-SiC) composite, the fusion temperature, and the 
casting time of Al-SiC are being investigated for their effects on the 
strength of Al-SiC. Three levels of SiC concentration, three levels of 
fusion temperature, and two casting times are selected. A factorial 
experiment with two replicates was conducted and the following data 
in certain standard units (Mpa) were obtained. 


Casting Time 
1 2 


oO oO 
Silicon Carbide Temperature (°C) Temperature (°C) 


Concentration (%) 1300 1400 1500 1300 1400 #41500 


10 186.6 187.7 189.8 188.4 189.6 190.6 
186.0 186.0 189.4 188.6 190.4 190.9 
15 188.5 186.0 188.4 187.5 188.7 189.6 
187.2 186.9 187.6 188.1 188.0 189.0 
20 187.5 185.6 187.4 187.6 187.0 188.5 


186.6 186.2 188.1 189.4 187.8 189.8 


(a) State the model and the assumptions for the experiment. All 
factors may be regarded as fixed effects. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in the strength of the Al-SiC 
composite due to the casting time. Use a = 0.05. 

(d) Test whether there are differences in the strength of the Al-SiC 
composite due to the temperature. Use a = 0.05. 

(e) Test whether there are differences in the strength of the Al-SiC 
composite due to the percentage of silicon carbide concentration. 
Use a = 0.05. 

(f) Test the significance of different interaction effects. Use a = 0.05. 

3. The production management department of a textile factory is study- 
ing the effect of several factors on the color of garments used to 
manufacture ladies’ dresses. Three machinists, three production cir- 
cuit times, and two relative humidities were selected and three small 
samples of garments were colored under each set of conditions. The 
completed garment was compared to a standard and a range of scale 


340 


The Analysis of Variance 


was assigned. The data in certain standard units are given as follows. 


Relative Humidity 
Low High 
; Machinist Machinist 
Production a 
Circuit Time 1 2 3 1 2 3 


40 28 32 36 29 43 =~ = 39 
29 33 370 «628 )~—S 41s 4d 
30 «631060 334 3B 40—s«4 
50 41 39 38 42 39 39 
40 43 39 44 43 41 
41 44 40 40 41 = 36 
60 33 40 31 #31 41 33 
29 40 32 34 42 = 31 
32 39 «©6300 0330 3929 


(a) State the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in coloring due to relative 
humidity. Use a = 0.05. 

(d) Test whether there are differences in coloring due to machinist. 
Use a = 0.05. 

(e) Test whether there are differences in coloring due to production 
circuit time. Use a = 0.05. 

(f) Test the significance of different interaction effects. Use a = 0.05. 

(g) Do exact tests exist for all effects? If not, use the pseudo- F tests 
discussed in this chapter. 


. Consider the following data from a factorial experiment involving 


three factors A, B, and C, where all factors are considered as having 
fixed effects. 


C C C3 


A, 20.1 19.7 208 21.7) 193 183 20.7 204 24.1 
334 185 14.7 203 178 167 19.2 186 18.6 
27.2 173 185 194 181 15.2 181 175 16.2 
Ay 264 22.3 21.2 238 203 173 176 224 12.7 
195 204 196 224 22.1 185 19.1 207 163 
233 193 183 21.2 235 203 208 194 17.1 


(a) State the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Carry out tests of significance on all interactions. Use w = 0.05. 

(d) Carry out tests of significance on the main effects. Use aw = 0.05. 

(e) Give an illustration of how a significant interaction has masked 
the effect of factor C. 
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5. Consider a factorial experiment involving three factors A, B, and C, 
and assume a three-way fixed effects model of the form 


Vijke = +0; + Bj + VE + (BY) jg + ijre 
(Gi = 1,2,3,4; 7 = 1,2:k = 1,2, 3). 


It is assumed that all other interactions are either nonexistent or neg- 
ligible. The relevant data are given as follows. 


B, Bo 
G a2 & G4 GQ & 


A; 50 45 48 55 42 42 
58 52 54 43 44 48 
Ap 47 38 42 38 38 48 
48 42 43 41 42 53 
Az 5.7 45 47 45 39 3.9 
58 47 55 47 45 46 


Ag 45 43 42 23 #38 45 
43 37 44 44 43 54 


(a) State the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform a test of significance on the B x C interaction. Use 
a = 0.05. 

(d) Perform tests of significance on the main effects, A, B, and C, 
using a pooled error mean square. Use a = 0.05. 

(e) Are two observations for each treatment combination sufficient 
if the power of the test for detecting differences among the levels 
of a factor C at the 0.05 level of significance is to be at least 0.8 
when y; = —0.2, y2 = 0.4, and y3 = —0.2? Use the same pooled 
estimate of o2 as obtained in the analysis of variance. 

6. Ostle (1952) discussed the results of an analysis of variance of a 
three-factor factorial design. The experiment consisted of determin- 
ing soluble matter in four extract solutions by pippetting in duplicate 
25, 50, and 100 ml volumes of solution into dishes. The solution was 
evaporated and weighed for residues. The experiment was replicated 
by repeating it on three days. The researcher is interested in only 
the four extracts and only the three volumes used 1n the experiment. 
However, the days are considered to be a random sample of days. 
The mathematical model for this experiment would be: 


i=1,2,3,4 

Vijge = Mta+By+y+(B)j+@yvn. J J=1,2,3 
+ (BY) jn + CQBY Dijk + Cijee k=1,2,3 
£=1,2 


b] 


where ju is the general mean, a; is the effect of the i-th extract, B; is 
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the effect of the j-th volume, y,; is the effect of the k-th day, (@B);; 
is the interaction of the i-th extract with the j-th volume, (ay);x 1S 
the interaction of the i-th extract with the k-th day, (By) j, 1s the 
interaction of the j-th volume with the k-th day, and (@By );;x 1s the 
interaction of the 7-th extract with the j-th volume and the k-th day, 
and e;;x¢ 1s the customary error term. Furthermore, it is assumed that 
the a;’s, B;’s, and (@B);;’s are constants subject to the restrictions: 


4 3 
>> a; = >- Bi; = 0, 
j=l 


i=1 
4 3 
>> @B)ij = >| @B)ij = 0; 
i=l j=l 


and the y,’s, (wy)ix’s, (BY) jx’S, and ejjxe’s are independently and 
normally distributed with mean zero and variances 0/7, oj, 03, and 
o7, respectively; and subject to the restrictions: 


4 3 
YS @y ik = >) By) jx =0 (kK =1,2,3). 


The analysis of variance computations of the data (not reported here) 
are carried out exactly as in Section 5.16 and the results are given as 


follows. 
Source of Degrees of Mean Expected 
Variation Freedom Square Mean Square 
2 2 3x3x2~ 
Extract 3 161.5964 of +3 x 209, + Gop da 
y) ) 4 x =~ 3 
Volume 2 0.02443 of +4 x 20%, + > LA 
j=l 
Day 2 0.07535 of +4x3x 20% 
Extract x Volume 6 0.00772. of + 202 By 
+a pion > > (oB);, 
Extract x Day 6 0.03959 of+3x 20%, a 
Volume x Day 4 0.01501 a? +4x 20%, 
Extract x Volume x Day 12 0.00654 of + 202 By 
Error 36 0.00565 «a2 
Total 71 


Source: Ostle (1952). Used with permission. 
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(a) Test whether there are differences in soluble matter among dif- 
ferent extracts. Use a = 0.05. 

(b) Test whether there are differences in soluble matter among dif- 
ferent volumes. Use a = 0.05. 

(c) Test whether there are differences in soluble matter among dif- 
ferent days. Use a = 0.05. 

(d) Test for the following interaction effects: extract x volume, 
extract x day, volume x day, and extract x volume xday. Use 
a = 0.05. 

(e) Determine appropriate point and interval estimates for each of 
the variance components of the model. 

(f) Itis found that the effects due to day, extract x day, and volume x 
day are all significant; that is, the method is unreliable in that 
the differences among volumes will not be the same on different 
days. Give one possible explanation of the excessive variation 
among days. How might this difficulty be overcome? 

7. Damon and Harvey (1987, p. 316) reported data from an experiment 
with radishes involving a three-factor factorial design. There were 
two sources of nitrogen — ammonium sulfate and potassium nitrate, 
three levels of nitrogen, and two levels of treatments — nitrapyrin and 
no nitrapyrin. The following data are given where four observations 
are the fresh weights of the plants in grams/pot. 


Source of Nitrogen 
Ammonium Sulfate Potassium Nitrate 


Level Nitrapyrin)§ NoNitrapyrin§ Nitrapyrin § No Nitrapyrin 


1 14.3 17.5 17.6 13.9 
15.9 16.7 24.0 16.8 
14.8 15.7 13.5 17.3 
20.8 15.1 17.9 12.6 
2 37.5 39.6 43.3 45.8 
29.4 33.0 53.5 46.9 
33.8 52.8 49.3 48.0 
33.1 36.2 49.9 47.0 
3 41.4 52.5 59.8 77.8 
49.9 53.4 98.4 87.6 
43.2 51.7 79.4 83.4 
40.1 52.2 80.0 84.7 


Source: Damon and Harvey (1987, p. 316). Used with permission. 


(a) State the model and the assumptions for the experiment. Consider 
it to be a fixed effects model. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in weights due to source of 
nitrogen. Use a = 0.05. 
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(d) Test whether there are differences in weights due to levels of 
nitrogen. Use a = 0.05. 

(e) Test whether there are differences in weights due to levels of 
treatment. Use a = 0.05. 

(f) Test the significance of different interaction effects. Usea = 
0.05. 


. Scheffé (1959, pp. 145-146) reported data from an experiment con- 


ducted by Otto Dykstra, Jr., at the Research Center of the General 
Food Corporation, to study variation in moisture content of a certain 
food product. A four-factor factorial design, involving three kinds of 
salt, three amounts of salt, two amounts of acid, and two types of 
additives, was used. The moisture content (in grams) of samples in 
the experimental stage were recorded and the data given as follows. 


Amount of Acid 
1 2 
Kind of | Amount Type of Additive Type of Additive4 

Salt of Salt 1 2 1 2 
1 1 8 5 8 4 
2 17 11 13 10 

3 22 16 20 15 

2 1 7 3 10 5 
2 26 17 24 19 

3 34 32 34 29 

3 1 10 5 9 4 
2 24 14 24 16 

3 39 33 36 34 


Source: Scheffé, (1959, p. 145). Used with permission. 


(a) State the model and the assumptions for the experiment. Consider 
it to be a fixed effects model. Since there is no replication, one 
must think what error term to use. 

(b) Analyze the data as the full factorial model and report the analysis 
of variance table. 

(c) Test whether there are differences in moisture content due to kind 
of salt. Use aw = 0.05. 

(d) Test whether there are differences in moisture content due to 
amount of salt. Use a = 0.05. 

(e) Test whether there are differences in moisture content due to 
amount of acid. Use a = 0.05. 

(f) Test whether there are differences in moisture content due to type 
of additives. Use a = 0.05. 

(g) Test the significance of different interaction effects. Usea = 
0.05. 
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9. Davies (1956, p. 275) reported data from a 24 factorial experiment 
designed to investigate the effect of acid strength, time of reaction, 
amount of acid, and temperature of reaction on the yield of an isatin 
derivative. The following data are given where observations are the 
yield of an isatin derivative measured in gms/100 gm of base material. 


(a) 


Temperature of Reaction 
60°C 70°C 
Amount of Acid =Amount of Acid 


Acid Reaction 
Strength Time 35ml = =45 ml 35 ml 45 ml 
87% 15 min 6.08 6.31 6.79 6.77 
30 min 6.53 6.12 6.73 6.49 
93% 15 min 6.04 6.09 6.68 6.38 
30 min 6.43 6.36 6.08 6.23 


Source: Davies (1956, p. 275). Used with permission. 


State the model and the assumptions for the experiment. Consider 
it to be a fixed effects model. 


(b) Analyze the data assuming that three- and four-factor interactions 


(c) 
(d) 
(e) 
(f) 
(g) 


are negligible and report the analysis of variance table. (Davies 
stated that on technical grounds, the existence of three- and four- 
factor intereactions is unlikely.) 

Test whether there are differences in yield due to acid strength. 
Use w = 0.05. 

Test whether there are differences in yield due to reaction time. 
Use a = 0.05. 

Test whether there are differences in yield due to amount of acid. 
Use a = 0.05. 

Test whether there are differences in yield due to temperature of 
reaction. Usea = 0.05. 

Test the significance of two-factor interaction effects. Use a = 
0.05. 
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6.0 PREVIEW 


In Chapters 3 through 5 we considered analysis of variance for experiments 
commonly referred to as crossed classifications. In a crossed-classification, 
data cells are formed by combining of each level of one factor with each level 
of every other factor. We now consider experiments involving two factors such 
that the levels of one factor occur only within the levels of another factor. Here, 
the levels of a given factor are all different across the levels of the other factor. 
More specifically, given two factors A and B, the levels of B are said to be nested 
within the levels of A, or more briefly B is nested within A, if every level of 
B appears with only a single level of A in the observations. This means that if 
the factor A has a levels, then the levels of B fall into a sets of b), bo, ..., ba 
levels, respectively, such that the i-th set appears with the 7-th level of A. These 
designs are commonly known as nested or hierarchical designs where the levels 
of factor B are nested within the levels of factor A. 

For example, suppose an industrial firm procures a certain liquid chemical 
from three different locations. The firm wishes to investigate if the strength of 
the chemical is the same from each location. There are four barrels of chemi- 
cals available from each location and three measurements of strength are to be 
made from each barrel. The physical layout can be schematically represented 
as in Figure 6.1. This is a two-way nested or hierarchical design, with barrels 
nested within locations. In the first instance, one may ask why the two factors, 
locations and barrels, are not crossed. If the factors were crossed, then barrel 1 
would always refer to the same barrel, barrel 2 would always refer to the same 
barrel, and so on. In this example, this is clearly not the situation since the 
barrels from each location are unique for that particular location. Thus, barrel 
1 from location I has no relation to barrel 2 from any other location, and so 
on. To emphasize the point that barrels from each location are different barrels, 
we may recode the barrels as 1, 2, 3, and 4 from location I; 5, 6, 7, and 8 from 
location II; and 9, 10, 11, and 12 from location III. For another example, suppose 
that in order to study a certain characteristic of a product, samples of size 3 
are taken from each of four spindles within each of three machines. Here, each 
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I II Il 
Locations ¢€ -~ X ) 
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FIGURE 6.1. A Layout for the Two-Way Nested Design Where Barrels Are Nested 
within Locations. 


I Ill 
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Samples 


FIGURE6.2 A Layout for the Two-Way Nested Design Where Spindles Are Nested 
within Machines. 


separate spindle appears within a single machine and thus spindles are nested 
within machines. Again, the layout may be depicted as shown in Figure 6.2. 
Both nested and crossed factors can occur in an experimental design. When 
each of the factors in an experiment is progressively nested within the preced- 
ing factor, it is called a completely or hierarchically nested design. Such nested 
designs are common in many fields of study and are particularly popular in sur- 
veys and industrial experiments similar to the ones previously described. In this 
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TABLE 6.1 
Data for a Two-Way Nested Classification 
Ay Az Aa 
By By2 eee Bip B>, B>> one Bop eee Bat B42 ee B.b 
Yilt yl2t ott: tb y21t 221 ++: )~)=6y2bt a Yali Ya2t "°° Yabl 
Mit2 »=-Y122—* Vb? y212 «=+222,—*** =~ Y2b2 ake Yat2 Ya22 °°: # Yab2 
Yitk Yi2k °** = Yibk y21k = =y22k 0+ ~~ Y2bk reat Yaitk Ya2k °°:  Yabk 


Yltn Yi2n <°°* Yibn Y2in  Y22n °°* = Y2bn mae Yatn Ya2n °*** Yabn 


and the following chapter, we consider only the hierarchically nested designs. 
The experiments involving both nested and crossed factors are considered in 
Chapter 8. 


Remarks: (i) To distinguish the crossed-classification from the nested-classification, 
we can Say that factors are crossed if neither is nested within the other. 

(ii) When uncertain as to whether a factor is crossed or nested, try to renumber the 
levels of each factor. If the levels of the factor can be renumbered arbitrarily, then the 
factor is considered nested. 

(iii) The nesting of factors can also occur in an experiment when the procedure restricts 
the randomization of factor-level combinations. 

(iv) The experiments involving one-way classification can be thought of as one-way 
nested classification, where a factor corresponding to “replications” is nested within the 
main treatment factor of the experiment. 


6.1 MATHEMATICAL MODEL 


Consider two factors A and B having a and b levels, respectively. Let the b levels 
of factor B be nested under each level of factor A and let there be n replicates at 
each level of factor B. Let y,;, be the observed score corresponding to the i-th 
level of factor A, j-th level of factor B nested within the i-th level of factor A, 
and the k-th replicate within the j-th level of B. The data involving the total of 
N =axbxnscores y;;x’s can then be schematically represented as in Table 6.1. 

The analysis of variance model for this type of experimental layout is taken as 


| ee ere 
Yijk = U+ 0; + Byiy + xij) = ee eee 2 (6.1.1) 
1,2 
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where —oo < jt < 00 is the overall mean, a; is the effect due to the i-th level 
of factor A, Bjq) is the effect due to the j-th level of factor B nested within 
the i-th level of factor A, and e,,;;) is the error term that takes into account the 
random variation within a particular cell. The subscript j(i) means that the j-th 
level of B is nested within the i-th level of A. Note that in the nested model 
(6.1.1) the main effects for factor B are missing since the different levels of 
factor B are not the same for different levels of factor A. Furthermore, note that 
the model has no interaction term between A and B since every level of factor 
B does not appear with every level of factor A. 


6.2 ASSUMPTIONS OF THE MODEL 


For all models given by (6.1.1), it is assumed that the errors eg(j)’S are un- 
correlated and randomly distributed with mean zero and variance o2. Other 
assumptions depend on whether the levels of A and B are fixed or random. If 
both factors A and B are fixed, we assume that 


a b 
Sia; = >> By = 0 (is 2 oe). 
i=] j=l 


That is, the A factor effects sum to zero and the B factor effects sum to zero 
within each level of A. Alternatively, if both A and B are random, then we 
assume that the @;’s and 8;(;)’s are mutually and completely uncorrelated and 
are randomly distributed with mean zero and variances o7 and Op» respec- 
tively. Mixed models with A fixed and B random or A random and B fixed 
are also widely used and will have analogous assumptions. For example, if 
we assume that A is fixed and B is random, then fj,)’s are uncorrelated and 
randomly distributed with mean zero and variance OF subject to the restriction 
that )>7_, a; = 0. 


6.3. ANALYSIS OF VARIANCE 


Starting with the identity 


Yijk — Y... = O.. — YW) + iy. — Yi.) + Oije — Siz.) (6.3.1) 


squaring both sides, and summing over i, j, and k, the total sum of squares can 
be partitioned as 


b 
Yous, y = bn 6: =). Pen Ou - yi.) 
1 j=1 k= i=l j= 

b on 


+ S > Y_“Onie — vi)’, (6.3.2) 


i=1 j=1 k=1 


a 


i= 
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with usual notations of dots and bars. The relation (6.3.2) 1s valid since the 
cross-product terms are equal to zero. The identity (6.3.2) states that the total 
sum of squares can be partitioned into the following components: 


(i) asum of squares due to factor A, 
(11) a sum of squares due to factor B within the levels of A, and 
(111) a sum of squares due to the residual or error. 


As before, the equation (6.3.2) may be written symbolically as 
SSr = SS, + SSacay + SSe. (6.3.3) 


There are abn — | degrees of freedom for SS;, a — 1 degrees of freedom for 
SS, a(b — 1) degrees of freedom for SS g:4), and ab(n — 1) degrees of freedom 
for the error. Note that (a — 1) + a(b — 1) + ab(n — 1) = abn — 1. The mean 
squares obtained by dividing each sum of squares on the right side of (6.3.3) are 
denoted by MS,4, MSa,a), and MSz, respectively. The expected mean squares 
can be derived as before. The traditional analysis of variance summarizing the 
results of partitioning of the total sum of squares and degrees of freedom and 
that of the expected mean squares is shown in Table 6.2. 


6.4 TESTS OF HYPOTHESES: THE ANALYSIS 
OF VARIANCE F TESTS 


Under the assumption of normality, the mean squares, MS,, MSa,a), and MSz 
are independently distributed chi-square variables such that the ratio of any two 
mean squares is distributed as the variance ratio F. Table 6.2 suggests that if 
the levels of A and B are fixed, then the hypotheses 

Hj :alla; =O versus Hy‘: alla; 40 
and 


Hy all Bic) =0Q versus H? : all Bic) a 0 
can be tested by the ratios 


Fa = MS,4/MSeE 
and 


Fg = MSp,ay/MSe, 


respectively. It can be readily shown that under the null hypotheses H}* and 
H;’, we have 


F, ~ Fla —1,ab(n — 1)] 


The Analysis of Variance 


352 


70 7 70 70 
[=f |=! [=f =! 
(1 — 9)P — q)D 
q ov q =D 
IF} 1p = {229 
n a 1 _ d qa | , = 
7ouq + 70 ae a + fou + Zo 7ouq + fou + 70 ee a + 70 
poxiy g ‘wopuey y wopuey g ‘paxly y wopuey g ‘wopuey Y paxiy g ‘paxi4 y 
Hl |PPOW Il |PPOW I |}PPOW 


aaenbs ueaw pa}dadxq 


“SS | — uqo [210], 

ASW ISS (I — u)qv 10g 
WaSW WSs (I—9)o V UlUIM g 
YSW YSS [-—0 V oy and 
aaenbs = sauenbs wopoal4 UOIJELLA 


uraw young jo saaisdaq JO 391Nn0S 
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and 
Fz ~ Fla(b — 1), ab(n — 1)]. 


Thus, H¢' and H,’ are tested using the test statistics F4 and Fg respectively. 
Similarly, if both A and B are random factors, we test! 


a Avsc 32 
Hj :0, =90 versus H; :a; > 0 


by 
Fa = MS,4/MSa a) 
and 
Hy [OB =0 versus H, sor > 0 
by 


Fg = MSava)/MSeE. 

Finally, if A is a fixed factor and B is random, then 
Hp :alla; =O versus H;} calla; 40 

is tested by 

Fx = MS,/MSa,a) 
and 

Hy Op =0 versus H, [0% > 0 

is tested by 


Fx = MSacay/MSe. 


Remark: If one does not reject the null hypothesis Hy’ :oj = 0, then MS g4) might be 
considered to estimate the same population variance as does MS¢z. Thus, as remarked 
earlier in Section 4.6, some authors recommend computing a pooled mean square by 


! For some results on approximate tests for other hypotheses concerning a2 and o2, involving a 
non zero value and one or two-sided alternatives, see Hartung and Voet (1987) and Burdick and 
Graybill (1992, pp. 82-83). 
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pooling the sums of squares SSa,4) and SS- and the corresponding degrees of freedom 
a(b — 1) and ab(n — 1) in Table 6.2. This will theoretically provide a more powerful 
test for differences due to levels of factor A. However, as indicated before, there is no 
widespread agreement on this matter and care should be exercised in resorting to pooling 
procedure. 


6.5 POINT ESTIMATION 


In this section, we present results on point estimation for parameters of interest 
under fixed, random, and mixed effects models. 


MODEL I (FIXED EFFECTS) 


If both factors A and B are fixed, the parameters jz, a;’s, and Bj)’s may be 
estimated by the least squares procedure by minimizing the quantity 


a b n 
O=) 0d) > Oi — He - % — Bw (6.5.1) 


i=) j=l k=] 


with respect to yz, a, and B;(;); and subject to the restrictions: 


a b 
wa >) Bea Cad aw), (6.5.2) 
i=] j=l 


It can be readily shown by the methods of elementary calculus that the resulting 
solutions are: 


=. (6.5.3) 
6; =3;.-j¥., i=1,2,...,a (6.5.4) 

and 
Bia = 9. - Fi. §=1,2,...,0; f=1,2,...,0. (6.5.5) 


It should be observed that the estimators (6.5.3) through (6.5.5) have consider- 
able intuitive appeal; the A treatment effects are estimated by the average of all 
observations under each level of A minus the grand mean, and the B treatment 
effects within each level of A are estimated by the corresponding cell average 
minus the average under the level of A. 

The estimators (6.5.3) through (6.5.5) have variances given by 


Var( ji) = 0; /abn, (6.5.6) 
Var(&;) = (a — 1)o7 /abn, (6.5.7) 


Two-Way Nested (Hierarchical) Classification 355 


and 
Var(B ji) = (b — 1)0} /bn. (6.5.8) 


The other parameters of interest may include: jz +a; (means of factor level A), 
pairwise differences a; — aj’, B ji) — Bj), and the contrasts of the type 


a a b b 
So bia (» £; = o and > &Biiy (> t' = o) 
i=] i=l j=! j=] 


Their respective estimates along with the variances are: 


+a; = ji, Var( ju + a) = af /bn; (6.5.9) 
QQ - ay = Yj.. a Vies Var(a; = Q;’) = 20, /bn; (6.5.10) 
Bia) — Bray = Vij. — Yaz, — Var( Bia — By) = 20,7 /n; (6.5.11) 


Sia, = wae ver( > 2a) = 5 a) o; /bn; (6.5.12) 
[=] i=] i=] i=] 


and 


YE Bia ie vee) Gr = (>: «) oy/n. (6.5.13) 
j=l j=l j=l 


j=l 


The best quadratic unbiased estimator of o? is, of course, provided by the error 
mean square. 


MODEL II (RANDOM EFFECTS) 


When both factors A and B are random, the analysis of variance method can be 
used to estimate the variance components 07, 03, and o7. From the expected 
mean squares column under Model II of Table 6.2, we obtain 


5; = MSz, (6.5.14) 

65 = (MSga) — MSz)/n, (6.5.15) 
and 

32 = (MS4 — MSagya)/bn. (6.5.16) 


The optimal properties of the analysis of variance estimators discussed in Sec- 
tion 2.9 also apply here. However, again 63 and 6{ can produce negative 
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estimates.? A negative estimate may provide an indication that the correspond- 
ing variance component may be zero. One then might want to replace any 
negative estimates by zero, pool the adjacent mean squares, and subtract the 
pooled mean square from the next higher mean square for estimating the corre- 
sponding variance component. Finally, we may note that a minimum variance 
unbiased estimator for op /a2 is given by 


|e (1 Sea ) _ | (6.5.17) 
n| MSe ab(n — 1) 7 


MODEL III (MIXED EFFECTS) 


Many applications of this design involve a mixed model with the main factor 
A fixed and the nested factor B random. For a mixed model situation, the fixed 
effects a;’s are estimated by 


OS Vi ys 2 eee 2 


and the variance components a? and o? can be estimated by eliminating the line 
corresponding to A from the mean square column of Table 6.2 and applying 
the analysis of variance method to the next two lines. Table 6.3 summarizes the 
results on point estimates of some common parameters of interest. 


6.6 INTERVAL ESTIMATION 


In this section, we summarize results on confidence intervals for parameters of 
interest under fixed, random, and mixed effects models. 


MODEL | (FIXED EFFECTS) 


An exact confidence interval for «7 can be based on the chi-square distribution 
of ab(n — 1)MSg/o2. Thus, a 1 — @ level confidence interval for a is 
b(n — 1)MS b(n — 1)MS 
ee (6.6.1) 
x*[ab(n — 1), 1 — @/2] x?[ab(n — 1), a/2} 


Furthermore, confidence intervals based on the ¢ distribution for a particular 
treatment or factor level mean jz +a; or a; can be readily obtained. For example, 


_ MSe 
y;,. — tlab(n — 1), 1 —a/2] ; <pP+ Qj 
n 


MSE 
bn 


< yi. + tlab(n — 1), 1 — @/2] 


2 For a discussion of the maximum likelihood and other nonnegative estimation procedures and 
their properties, the reader is referred to Sahai (1974b, 1976). 
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TABLE 6.3 
Point Estimates and Their Variances under Model Il 
Parameter Point Estimate Variance of Estimate 
yu. ... (o2 + nop) / abn 
pL + a; Yi. (02 + nop) [bn 
Qj Vie = Vas (a — (02 + nop) /abn 
Bie saey. ae oes. 2( 2 ay 
i i Mia JU x Oo, + nog bn 
a a a a 
> £50; (> te o > a5. (: a) (02 + nop) [bn 
(= i=l i=l =] 
a? MS¢ 203 [ab(n - 1) 
2 
a2 + nop MSp 2(02 + nop) [ab — 1) 
2 
; 4 (02 + nop) a 
MSava) — MS ay (eae neat ee meee eee 
op (MSz,a) E)/n -2 abo) GED 


and 


Oe am eae a/2)/ @— Se < Oj 
abn 
< (Fi, —5..) + tlab(n — 1), 1— a2}, @— MB 
abn 


give exact 1 — a@ level confidence intervals for jz + a; and a;, respectively. 
Similarly, to obtain confidence limits for a; — a;, we note from (6.5.10) that 
E (yi. — Vir.) = Oj — ay 
and 


Var(i.. — Yr.) = 207 /bn. 
The confidence limits can, therefore, be derived from the relation 


(Vi. — Vir.) — (Qi — air) 
JIMS; /bn 


Similar results on confidence intervals for Bj(i)’s and any pairwise differences 
on them, using the ¢ distribution, can also be obtained. However, multiple 
comparison methods, discussed in Section 6.9, are usually preferable. 


~ tlab(n — 1)]. (6.6.2) 
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MODEL II (RANDOM EFFECTS) 


An exact confidence interval for the error variance component @? is of course 
obtained as in (6.6.1). However, exact confidence intervals for the variance com- 
ponents a2 and a2 do not exist. One can, nevertheless, construct exact intervals 
for of/07,0;/(0; + of), and o;/(o; + of) by using results on the sampling 
distribution of the ratio of two mean squares. In particular, the probability is 
1 — @ that the interval 


(Ae nee eee ree 7 
n\ MSz  Fla(b—1), ab(n — 1);1—@/2] 


1 Ge a re 1) | (6.6.1) 
n\ MSz _ Fila(b —1), ab(n — 1);a@/2] : 


captures 07/07. Exact intervals ono} /(o; + oj) anda; /(o; +03) are obtained 
from (6.6. 3) using appropriate transformations. 3 Similarly, ron) a companion 
of two confidence intervals, one for o2 and the other for o2 + nop: one can 
obtain a conservative 100(1 — @) percent confidence interval for oF as 


0 2 |: a(b — 1)MSzva) ab(n — 1)MSgz 
<0, < —-| =——_— - ee 
[a(b — 1), @/2] [ab(n — 1), 1 —a@/2] 


For the random effects model, one sometimes may also want to determine a 
confidence interval for jz. To obtain confidence limits for jz, we note that 


E(y...) = pu 
and 


7 a; +noz + bno; 
0 


Now, Var(y...) can be estimated by MS, /abn, and it follows that 


os ae tla — 1). 
 MS,/abn 

The confidence limits for jz can now be determined using the standard normal 
theory. 


3 For some results on confidence intervals for the variance components o? and of, total variance 
a2 +624 o2, the ratio of variance components a2 ee proportions of variability a? /(o2 + 
og), 0g) (5; +02), of (of +o +02), of /(07 +05 +0;), and og /(of +05 +0,), see Burdick 
and Graybill (1992, pp. 80-90). 
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MODEL III (MIXED EFFECTS) 


An exact confidence interval for oa? is of course given as in (6.6.1). An exact 
confidence interval for 02, however, does not exist. One can obtain an exact 
interval for Of /o2 by using a procedure based on the statistic MSp,4)/MSz. 
Also, as in the case of Model I, it is possible to obtain confidence intervals for 
[L, @;, L+Q;, &; —a;, the contrast }°"_, £;a;()_;_, 2; = 0), or any linear com- 
bination of the means ey £; (uw + a;), where the £;’s are any set of constants. 
Thus, for example, exact 100(1 — a) percent confidence intervals for + a; 
and )>;_, £; (uw + a;) are given by 


MS a,a) 
bn 


yi. £ tla(b — 1), 1 — @/2] 


and 


> 45. = tla — 1), 1 — o/2] 
i=] 


respectively. However, again, multiple comparison methods discussed in Sec- 
tion 6.9 are to be preferred. 


6.7 COMPUTATIONAL FORMULAE AND PROCEDURE 


The computational formulae for the sums of squares may be obtained by ex- 
panding the corresponding definitional formulae and simplifying the algebra. 
They are: 

2 


a bi oon 
S87 = OY 9h — 


i=1 j=l k=1 


and 


Note that SS 3,4) can be written as 
a 


en 1 
SS Ba) = FE yy — mt | 
2 n d Jy obn 


i=] 
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This shows the idea that SSg,,4) is the sum of squares between the levels of B 
for each level of A which is summed over all levels of A. 


6.8 POWER OF THE ANALYSIS OF VARIANCE F TESTS 


For the fixed effects or Model I, we can calculate the power of each of the F tests 
given in Section 6.4 in the usual way. Thus, when B effects are investigated, 
we have 


a b Fe 
i“ se ». Bia) 
] i=] j=] 


PBA) = o.| a(b—1)+1 


Similarly, for A effects, we have 
/2 


r 1 
bn ) a? 
i=] 


a 


‘ _. 
Ge 


and so on. Except for this change, the power calculations remain unchanged. 
Power formulae under Models II and III can similarly be obtained. 


6.9 MULTIPLE COMPARISON METHODS 


When the factor A effect is fixed and the null hypothesis concerning A effects 
is rejected, we may want to use Tukey, Scheffé, or other methods to investigate 
contrasts of interest. For example, if H;' is rejected, we may want to investigate 
contrasts involving a@;’s of the form 


L= Stoo, (yo -0) 
i=l i=] 


which is estimated by 
L= 3S €; Yi... 
i=] 


Now, if B is also fixed (i.e., we have a Model I situation), then 


a A MS = 
Var(i) = = we 
f=1 
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Further, for the Tukey’s procedure involving pairwise differences, we have 
T = gla, ab(n — 1);1-— a]. 
For the Scheffé’s procedure involving general contrasts, we would have 
S? = Fla — 1, ab(n — 1);1 —a’]. 


In particular, for any m pairwise comparisons of the type a; — a@;’, its (1 — @)- 
level Bonferroni intervals can be obtained as 


2MS 
i.. — Fw.) # tab — 1), 1 — @/2m))/ a, 
nh 


Furthermore, since there are only a — 1 independent comparisons of this form, 
the (1 — a)-level Scheffé’s confidence intervals are given by 


= 


Oi vn) { — 1)Fla —1,ab(n — 1);1—-a] 
bn 


Similarly, when Hy is rejected, similar simultaneous intervals for any con- 
trast on Bj(j)’S or any pairwise differences on them can also be obtained. Thus, 
for any fixed 7, the (1 — a)-level Bonferroni simultaneous confidence intervals 


for Bj) — By) are given by 


2MS 
(ij. — Yi.) £ tlab(n — 1), @/2m},] z = 


The (1 — a@)-level Scheffé’s simultaneous confidence intervals are given by 


- - 2MS« )2 
(Vij. — Viv) & 4 (b — 1) F[b — 1, ab(n — 1);1 - @] . 


For the case when B is random (1.e., we have a mixed effects or Model III 
situation), 


Var(L) = ——@ Y°2?, 
ar(L) = —— e 
T = gla, a(b — 1);1—-a], 


and 


S* = Fla — 1, a(b — 1);1—a]. 


For making pairwise comparisons, Bonferroni intervals are given by 


2MS 
(¥;. — 3v.) £tla(b — 1),1- w/2m),/ ae daly 
nh 
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which gives an overall level of at least 1 — @ where m is the number of intervals 
constructed. 


6.10 UNEQUAL NUMBERS IN THE SUBCLASSES 


In experiments involving a nested classification, it is important to try to keep the 
sizes of nested factors (subsamples) equal. When the sizes of the subsamples 
are unequal, the analysis of variance and the expressions for the expected mean 
squares in the analysis of variance table become quite complicated. In this 
section, we indicate briefly an analysis of variance when there are unequal 
numbers in the subclasses.* 

Define the following notations: 


a = number of levels of factor A; 
b; = number of levels of factor B within the i-th level of factor A; 
nj;j = number of replications from the j-th level of factor B within 
the i-th level of factor A; 
nj. = ye n;; = number of observations at the i-th level of factor A; and 
NS np = 4 ie , Ni; = total number of observations in the 
experiment. 


Now, the sums of squares can be defined by an analogy from the corresponding 
balanced case. They are: 


i Nij bj Nij 2 
Sie) Ge y= Sv 
i=l g=1. k=1 i=l j=1 k=1 
a 2 2 
$84 = Domb -y y= Pome a 
bj a b; y2. a y2 
SSa(a) = 3 Si ii. —¥.)= > > ai = > a 
i=1 j=l i=1 j=1 4 i=1 0! 
and 
a bh Nj a Nij a bi y2. 
SSe => >>) Ou — Fu ss fk 7 — 
i=l j=l k=l i=l j=l. k=1 i=l j=l 4 
where 


njj 


ij. = y Yijk» Vij. = Yij./Mij, 
k=l 


4 For the design with unequal numbers in the subclasses there is no unique analysis of variance. 
The conventional analysis of variance being presented here is based on quadratics commonly 
known as Type I sums of squares. 
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bi Nij 
yi. = So vies Yi. = Vi. /Ni., 
j=l k=!1 
and 
a bh Nij 
y= Yijks Y.. = Y/N. 
i=1 j=l k=1 


The derivations for expected mean squares can be found in Scheffé (1959, pp. 
255-258) and Graybill (1961, pp. 354-357). The resultant analysis of variance 
table is shown in Table 6.4. The coefficients of variance components in the 
expected mean square column are determined as follows: 


be 2 a bi 2 a b 2 

a 

j=l ni y= ie ae Dye nij 
nj. N 

Ay = SS, 2 = —_ 


yb - 0 aie 


and 


TESTS OF HYPOTHESES 


Under Model I, the null hypothesis of interest, Hg: Byiy= 
Bai) = --- = B,,@) = 0, subject to the constraint that ae nij Pia) = 9,1 = 
1,2,...a, can be tested by the statistic 


Fy = MiSBw 

MS=z 
which, when H? is true, has an F distribution with )>7_, bi; -a and N—)77_, b; 
degrees of freedom. Similarly, one may be interested in testing whether the 
effects at each level of the factor A are the same, 1.e., H3 [Q) =p = +++ =A. 
The hypothesis H;', however, cannot be tested. It can be shown that the statistic 


_ MS, 


F’ — > 
4 MS; 


where Fy has an F distribution with a— 1 and N —}\;_, b; degrees of freedom, 
tests the hypothesis Hp: a; = 0,i = 1,2,...,a, subject to the constraints that 
y=) Nia; = O and an nijBji) = 0,i = 1,2,...,a. For some further 
discussion and derivation of the results, see Searle (1987, Chapter ITI). 
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Under Models II and III, it is evident that there are no simple F tests for the 
hypotheses relating factor A effects. This is because under the null hypothesis 
we do not have two mean squares in the analysis of variance table that estimate 
the same quantity. All three mean squares happen to estimate different quan- 
tities. In addition, although MS, is distributed as constant times a chi-square 
random variable, MS a4) and MS, do not in general have a scaled chi-square 
distribution. Furthermore, MS g,4) and MS, are not statistically independent. 
Approximate F tests can be developed using the Satterthwaite procedure dis- 
cussed in Appendix K.° However, a test of the hypothesis: 


He [OB =0 versus H; Of > 0 
can be carried out by the ratio 
MS a,4)/MSe, 


which has an F distribution with }~7_, b; — a and N — }°\_, b; degrees of 
freedom. 


POINT AND INTERVAL ESTIMATION 


Under Model I, when H,’ is rejected, it is often of interest to construct a 
simultaneous confidence interval for Bj) — Bj), i = 1, 2,..., a. Fora fixed, 
the 1 —@ level Bonferroni simultaneous confidence intervals for m independent 
comparisons of the form 6 j:) — Bj) are given by 


(Vij. — Yi Dat) N — yo bi. 1—a/2m 


I=] 


Furthermore, for any given i, since there are only b; — 1 independent compar- 
isons of this form, the 1 — @ level Scheffé’s simultaneous confidence intervals 
are given by 


aor é 1 1 : 
ij. Yi J) (Oi — DF 1b; —1,N — > obi 1 —a}{—+—]MSe;e . 
= Nij Nj j' 


> Some authors have ignored the unbalanced structure of the design and have used the conventional 
F test based on the statistic MS, /MSa,,) with a — | and are b; — a degrees of freedom (see, 
e.g., Bliss (1976, p. 353)). A common procedure is to ignore the assumption of independence 
and chi-squaredness and construct an approximate F test using synthesis of mean squares based 
on the Satterthwaite procedure. For some further discussions and results on tests of hypotheses 
concerning variance components involving unequal sample sizes, see Cummings and Gaylor 
(1974), Tietjen (1974), Hussein and Milliken (1978b), Tan and Cheng (1984), Khuri (1987), and 
Hernandez et al. (1992). | 
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Under Models II and III, if expected mean squares are equated to the corre- 
sponding mean squares in Table 6.4 and the resulting equations are solved for 
the variance components, these are the so-called analysis of variance estimates 
and are unbiased (Searle (1961)). For example, under Model II, the estimates 
of the variance components are® 


6°? =MSz, 

a ] 

6g = —(MSaa) — MSz), 
1 


and 


et 
6? = {MS — MSaia)) rie GS sie Mss) 
n3 


The expression (7; — z)//, 1s usually negligible, in which case the expression 
for Ge reduces to (MS, — MSa,a))/n3. The quantities 7; and 2 can be thought 
of as kinds of averages of the numbers of observations in the subgroups (n;;) 
and they both reduce ton when bj = band n,; = n. Similarly, 73 can be thought 
of as an average of the numbers of observations corresponding to the levels of 
factor A and it reduces to bn when b; = b and nj; =n. 

Under Model II, in terms of the results on confidence intervals for the variance 
components, an exact confidence interval for a2 can be obtained by noting that 
the statistic (N — }“7_, b;)MS,/o?2 has a chi-square distribution with N — 
>-;, 5; degrees of freedom. However, exact confidence intervals for o2 and 
og do not exist. A conservative 1 — @ level confidence interval for og can be 
obtained as 


Remark: Approximate confidence intervals for o2 and OR have been proposed by Her- 
nandez et al. (1992). Similarly, the problem of constructing exact confidence intervals 
on the ratios of variance components o2/07 and of /0; has been discussed by Seely and 
El-Bassiouni (1983) and Verdooren (1988). In addition, Burdick and Graybill (1985) 
and Hemange? ane pu (1993) have considered confidence intervals for the total 
variance G. + O5 + a2 , and Burdick et al. (1986a) and Sen et al. ee have BeNEIODeS 
confidence intervals on the proportions of variability o7/(o; +03 +02), o3/(0; +95 24 

a2), and of /(o;7 + of + o2). For a concise aiceuseion of thieee and ier eile. on 


© The analysis of variance estimators for the unbalanced classification do not lead to the same 
estimates as the maximum likelihood estimators. The maximum likelihood equations are difficult 
to solve in unbalanced classifications. For some results on other estimation procedures, see Searle 
(1971b, pp. 475-477). 
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confidence intervals for variance components including numerical examples, see Bur- 
dick and Graybill (1992, pp. 98-109). 


Under Model III, we are also in a difficult situation when we want to determine 
confidence intervals for the general mean jz, factor A level means yz + @;, fixed 


effects w;, pairwise differences a; —a;’, or the contrast ee £30; cae £; = 0). 
For example, under Model III, the variance of a factor A level mean is 


and the variance of the overall mean y__ 1s 


bj 
No? +- y\njjo5 


Var(5...) = = 
Furthermore, 
n; oO +) ni 3 nyo? + yon OB 
Var(yi.. — Yi.) = 2 + a 
and 
aye N—nj. 4 N —2nj, < a Lah 2). 
Var(yi.. — Y...) = “Wa a Cee ij eo) ms 


Comparing these variances with the expected mean square expressions in 
Table 6.4, we notice that there are no simple estimates of the variances. Ap- 
proximate methods involving Satterthwaite procedure can, however, be used to 
obtain the required confidence intervals. Burdick and Graybill (1992, pp. 170— 
171) give a numerical example illustrating methods of constructing confidence 
intervals for 07, 03, and 03/02. 


Remark: In designing an experiment involving subsamples, it is suggested that sub- 
samples of equal size preferably should be used. If during the course of the study, 
certain data are missing and the numbers in the various subsamples are unequal, then 
after some considerations of why the data are missing, the results can be analyzed by 
using the means of the subsamples. Such an analysis violates the assumptions of equal 
variance, but the variances will be approximately equal. A drawback in analyzing the 
cell means is that we no longer have the unbiased estimates of the variance components. 
Since F tests are relatively robust against heterogeneous variances, small differences 
in n;; would not affect the conclusions. An advantage of cell means procedure is that 
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TABLE 6.5 
Weight Gains of Chickens Placed on Four Feeding Treatments 
Treatments 
LoCaLoL LoCaHiL HiCaLoL HiCaHiL 
Pens 1 2 1 2 1 2 1 2 
Weight gains 573. 1,041 618 943 731 416 518 416 


613 636 659 734 =770 776 = 672 7716 
901 685 817 1,050 787 657 576 657 


Pen totals (yi) 4156 4564 4,414 4647 4,728 3,720 4,241 3,720 
Treatment totals (y; ) 8,720 9,061 8,448 7,961 


Source: Damon and Harvey (1987, p. 26). Used with permission. 


the values tend to be normally distributed in view of the central limit theorem. If, how- 
ever, the assumption of normality is not in question and we wish to estimate variance 
components, an alternative procedure is as follows. Missing values are replaced by the 
corresponding cell means and an analysis is carried out assuming an equal number of 
observations. The degrees of freedom corresponding to the residual or error mean square 
are decreased by one for each missing value. The consideration of why the values are 
missing should always be taken into account. It is appropriate to analyze the remaining 
data only if the loss of the missing data can be ascribed to have occurred by chance. The 
reader is referred to Yates (1934) for further discussion on this point. 


6.11 WORKED EXAMPLE FOR MODEL | 


Damon and Harvey (1987, p. 26) reported data on weight gains of chickens 
placed on four feeding treatments. The original data were supplied by Dr. 
Donald L. Anderson of the Department of Veterinary and Animal Sciences at 
the University of Massachusetts. The experiment involved the determination 
of weight gains (in grams) from 10 to 20 weeks of chickens placed on four 
feeding treatments obtained from combinations of high and low calcium and 
lysine. Weight determinations were made using six chickens in two pens from 
each of the four feeding treatments. The data are given in Table 6.5. 

The data in Table 6.5 can be regarded as forming a two-way nested clas- 
sification with pens nested within treatments and weight determinations made 
using six chickens from each pen. Here, treatments are fixed, and although pens 
are usually selected randomly, we analyze the data under the assumptions of 
a fixed effects model where both treatments and pens are considered to have 
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systematic effects. The mathematical model would be 
f= 2 

Vig = Rt + Bite +J=l2 

ea we 


where y; jx is the k-th observation in the j-th pen and the i-th treatment, ju 1s 
the general mean, a; is the effect of the i-th treatment, Bj() 1s the effect of 
the j-th pen nested within the i-th treatment, and e,,;;) 1s the customary error 
term associated with the k-th observation, nested within the j-th pen within the 
i-th treatment. Also, the @;’s and Bj()’s are assumed to be fixed effects with 


een a; = 0, are Bjay = 0,1 = 1,2,3,4, and exc j)’s are assumed to be 


independently and normally distributed with mean zero and variance o?. 


Using the computational formulae for the sums of squares given in Section 
6.6, we have 


( ) ( ) ~~ s ( ) 
4x2x6 


= 25,515,538 — 24,353,252.083 
= 1,162,286.917, 

(8,720)* + (9,601)? + (8,448)* + (7,961)* (34,190)? 
— OC OCOOKGO”~OO””CCOC“‘ “ ‘KCK 
= 24,407, 195.500 — 24,353,252.083 
= 53,943.417, 
(4,156)? + (4,564)? + ---+ (3,720) 
EY 

(8,720)* + (9,061)? + (8,448)? + (7,961) 
nny ho ie 
= 24,532,883.667 — 24,407, 195.500 


= 125,688.167, 


SS4 


SS pa) = 


and 


(4,156)? + (4,564)* + --- + (3,720)? 


SSp = (573) + (636)? +---+ (657)! ; 


= 25,515,538 — 24,532,883.667 
= 982,654.333. 


These results along with the remaining calculations are summarized in Table 6.6. 


The Analysis of Variance 


LI6' S87 Z9T'I LY TRIOL 
70 8S€996‘bZ ELE PS9'786 Or log 
j=! I=! (1-2 (sjusurjeoy 
p67'0 6L7'1 es Oe — > +22 = HOHE —_L9T'889‘SZI p uIyyIM) suad 
an 
|=! 
. * 1 l = 4 a . ‘ « ‘ 
6S 0 ctELO in ¢ OxZ + 72 6t 1 186 LI LIV CV6 CS t SJUSUCOT] 
p 
anjea-d anjeA 4 aaenbs ueaw aaenbs sauenbs wopaad4 udI}eLIeA 
payoadx] ueaw jo wing jo saaisaq yO 304N0S 


G°g ajqey Jo Eye SUIL JYSIIM JY} 40j BDURLILA Jo SIsAjeUY 
9°9 J1EVL 


370 


Two-Way Nested (Hierarchical) Classification 371 


The test of the hypothesis HS (4) - all Bj) = O versus fs hace all Bj) # O gives 
the variance ratio of 1.279 which is not significant (p = 0.294). Similarly, the 
test of the hypothesis H3 + alla; = O versus H;} : alla; 4 O gives the variance 
ratio of 0.732 which 1s also not significant (p = 0.539). Thus, there do not 
seem to be any significant differences in both treatment and pen effects. 


6.12 WORKED EXAMPLE FOR MODEL II 


Box et al. (1978, pp. 574-575) reported data from an experiment designed to 
estimate moisture content of the pigment paste. For this purpose, 15 batches 
of pigment paste used in the manufacture were randomly selected, each batch 
was independently sampled twice, and for each sample, two analyses were 
performed. The data are given in Table 6.7. 

The model for this experiment is a two-way nested classification with samples 
nested within batches and two analyses (subsamples) made from each sample. 
We will analyze the data under the assumptions of a random effects model 
since both batches and samples are considered to have variable effects. The 
mathematical model would be 


i=1,2 
Yijk =UM+Q;+ Bray ten 4 J=1,2 
k=1,2 


where y;;, 1s the k-th observation (analysis) in the j-th sample in the i-th batch, 
4 is the general mean, a; is the effect of the i-th batch, Bj(;) is the effect of 
the j-th sample nested within the i-th batch, and e,;;;) is the effect of the k-th 
subsample nested within the j-th sample within the i-th batch (error term). 
Furthermore, the a's, Bj(i)’s, and ex ;)’S are assumed to be independently and 
normally distributed with mean zero and variances 02, Of, and a7, respectively. 

Using the computational formulae for the sums of squares given in Section 
6.7, we have 


1,607) 

SS7 = (40)? + (39)? +--- + (28)? — 

Tr = (40) + G9) +--+ + (28) isx2%2 
= 45,149 — 43,040.817 
= 2,108.183, 

es _ (139° + (105)? + --- + (130)? (1,607)? 

— 2x2 15x2x2 

= 44,251.750 — 43,040.817 


= 1,210.933, 


(79)? + (60)? + --- + (54)? 7 (139)* + (105)* + --- + (130)? 


SS 
By) 2 x2 
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TABLE 6.7 
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Moisture Content from Two Analyses on Two Samples of 15 Batches of 


Pigment Paste 


Samples 


Analyses 


Analysis totals (yj; ) 
Sample totals (y; ) 


Samples 


Analyses 


Analysis totals (y;; ) 
Sample totals (y;_) 


Samples 


Analyses 


Analysis totals (y;; ) 
Sample totals (y; ) 


1 2 
40 30 
39 30 
79 60 
139 
6 
1 2 
33 26 
32 24 
65 50 
115 
11 
1 2 
25 25 
23 27 
48 52 
100 


2 
1 Z 
26 25 
28 26 
54 51 
105 
7 
1 2 
23 32 
24 33 
47 65 
112 
12 
1 Z 
29 31 
29 32 
58 63 


121 


Source: Box et al. (1978, pp. 574-575). Used with permission. 


= 45,121.500 — 45,251.750 
= 869.750, 


and 


SS, = (40)? + (39)? + --- + (28) — 


— 45,149 — 45,121.500 


= 27.500. 


Batches 
3 4 
1 2 1 2 
29 14 30 24 
28 15 31 24 
57 29 61 48 
86 109 
Batches 
8 9 
1 2 1 2 
34 29 27 31 
34 29 27 31 
68 58 54 62 
126 116 
Batches 
13 14 
1 2 1 2 
19 29 23 25 
20 30 24 25 
39 59 47 50 
98 97 
(79)? + (60)? + 
2 


20 
39 


37 
76 


130 


+++ (64) 
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TABLE 6.8 
Analysis of Variance for the Moisture Content of Pigment Paste Data 
of Table 6.7 


Source of Degrees of Sumof Mean Expected 

Variation Freedom Squares Square Mean Square _ F Value p-Value 

Batches 14 1,210.933 86.495 of +20,+40; 1.492 0.226 

Samples 15 869.750 57.983 of + 20% 63.231 <0.001 
(within batches) 

Error 30 27.500 0.917 o2 

Total 59 2, 108.183 


These results along with the remaining calculations are summarized in Table 6.8. 
The test of the hypothesis Hy” :0% = O versus H? PG, > 0 gives the 
variance ratio of 63.231 which is highly significant (p < 0.001). However, the 
test of the hypothesis Hj: 02 = 0 versus H/': 02 > O gives the F ratio of 1.49 


which is not significant (p = 0.226). Thus, we reject the first null hypothesis but 
not the latter. The point estimates of the variance components are obtained as 


6° = 0.917, 


1 
0% = 5 (57.983 — 0.917) = 28.533, 
and 


1 
oJ = 7 (86.495 — 57.983) = 7.128. 


These variance components account for 2.5, 78.0 and 19.5 percent of the total 
variation in the experimental data. The findings suggest that perhaps the largest 
single source of variability is the error arising in chemical sampling from the 
batches. The batch-to-batch variability also seems to be quite large although 
the results are not statistically significant. In order to estimate the mean of a 
batch, the estimated variance based on one subsample from one sample would 
be 


6; + 63 = 0.917 + 28.533 = 29.450. 


The estimated variance based on two subsamples from one sample would be 
62 /2+ 6% = 0.917/2 + 28.533 = 28.992. Thus, there is very little gain in the 
precision of the estimation of a batch mean by using two samples rather than 
one. The use of two samples may, however, be useful as a check against any 
major errors. 
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Finally, we can obtain confidence limits for the overall mean yz of the process 
as follows. We have 


= y.-= 26.783, 
oa? + nop + bno? 


Var(y_) = : 
(y..) Abn 

and 

ee MS 86.495 

VaiG. SS 0. 

abn 5x22 
Now, since 
y..— p 


= ~ t[a—1] and +[14,0.975] = 2.145, 
y Vary...) 


the 95 percent confidence limits for jz are obtained as 


26.783 + 2.1457 1.442 = (24.207, 29.359). 


6.13. WORKED EXAMPLE FOR MODEL II: UNEQUAL 
NUMBERS IN THE SUBCLASSES 


To give an example of Model II involving unequal sample sizes, consider the 
data in Table 6.9 taken from Graybill (1961, pp. 357—358). The data are artificial 
but the experiment is supposed to represent a breeding experiment where factor 
A is supposed to designate sires (a = 4) and factor B is supposed to designate 
dams (b,; = 3, b2 = 4, b3 = 2, b4 = 3) nested within sires. There are a total 
of N = 52 observations. The data in Table 6.9 can be regarded as forming a 
two-way nested classification with unequal sample sizes. Here, dams are nested 
within sires and sample determinations are made from each dam. Since both 
dams and sires are randomly selected, the data should be analyzed using a 
random effects model. The mathematical model would be 


ee ae rere 
Vijk = Ut; + By +enij) 1 J=1,2,..., 5; 
k Leet ere 
where y;;x 18 the k-th observation for the j-th dam in the i-th sire, yw is the 
general mean, a; is the effect of the i-th sire, Bj(;) is the effect of the j-th dam 
nested within the i-th sire, and e,,;;) is the effect of the k-th observation nested 
within the j-th dam within the i-th sire (error term). Furthermore, the a;’s, 
Bjiy’s, and ex@j)’S are assumed to be independently and normally distributed 
with mean zero and variances a. Op and a2, respectively. 
All the quantities needed for the analysis of variance computations out- 
lined in Section 6.10 can be readily computed on an electronic calculator. The 
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TABLE 6.9 
Data on Weight Gains from a Breeding Experiment 
Sires 
1 2 3 4 
Dams 1 2 3 142 3 4 #12 +14 2 =3 
Weight gains 32 30 «34 26 «222+2«CO238:s«o21:««16sd“4:s3ss 42s 26 


31 26 30 20 31 21 21 20 %18 34 43 25 
23 29 26 18 20 24 #30 32 16 41 40 29 


26 28 34 21 26 17 40 35 40 
18 32 18 2937 
3] 
26 
Dam totals (yi) 112 131 213 64 94 112 72 68 65 146 189 157 


No. in j-th dam (njj) 4 5 7 3 4 5 3 3 4 4 5 5 
(within i-th sire) 


Sire totals (y; ) 456 342 133 492 
No. in #th sire (n; ) 16 15 7 14 


Source: Graybill (1961, p. 358). Used with permission. 


results are: 


y? /N = (1,423)"/52 = 38,940.942, 


oS ei 40,861.201, 


Nij 


and 


a 2 


—~ = 40,610.885. 


Thus, 


SSr = 41,811 — 38,940.942 = 2,870.058, 
SS4 = 40,610.885 — 38,940.942 = 1,669.943, 
SSga) = 40,861.201 — 40,610.885 = 250.316, 
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and 


SSe = 41,811 — 40,861.201 = 949.799. 
The corresponding degrees of freedom are determined as 


Total: VN —1=52—1=5l, 
Sires:.a —1=4-1=3, 


Dams (within sires): ys b —a=12-4=8, 


i=] 


Error: N — bj — 52—12=40. 


i=l 


For approximate tests of significance, we need to evaluate the coefficients 
of the variance components in the expected mean squares and then determine 
the linear combination of mean squares to be used as the denominator of an 
approximate F statistic using Satterthwaite procedure. The basic quantities 
needed to determine the coefficients of the variance components are: 


a bj 
N= yoni = 52, 


i=l] j=1 
“.n; (16)? + (15)? + (7)* + (14) 


— = 13.9615, 
= N 52 

a Fin? 4)2 Ce eee 2 

OOO! co ciee 
=~ N 52 
i=l j=1 

and 

a bie y?. 4)2 5) 7)2 4)2 5) 5) 
ye OO pe CORTE = 17.8441, 
Nj, 


Now, the coefficients of the variance components in the expected mean square 
column are given by 


1 jai Mi. 52 — 17.8441 
fiy = ————__ = = 4.2695, 
‘ iA 


Y) (6: - 1) 


i=] 
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TABLE 6.10 
Analysis of Variance for the Weight Gains Data of Table 6.9 
Source of Degrees of Sum of Mean Expected 
Variation Freedom Squares Square Mean Square 
Sires 3 1,669.943 556.648 0? + 4.409407 + 12.679502 
Dams 8 250.316 31.290 of + 4.269505 
(within sires) 
Error 40 949.799 23.745 a? 
Total 51 2,870.058 
bi 2 bi 2 
yy yr 
_ jaja "~~ far jaa Ns 17.8441 — 4.6158 
nn? = = —_—_—————_ = 4.4094, 
a— 1 4-1] 
and 
eae 


iz) N 52 — 13.9615 


——_—_—_——_ = = 12.6795. 
a— | 4-1] 


The resulting sums of squares, mean squares, and expected mean squares are 
summarized in Table 6.10. 

The dams within sires can be tested directly against the error mean square, 
giving F = 31.290/23.745 = 1.318 (p =0.262). The results are clearly non- 
significant. An approximate F test for sire effects can be carried out using 
the dams within sires mean square, giving F = 556.648/31.290 = 17.790 (p < 
0.001), which is highly significant. However, to use Satterthwaite procedure, 
we first compute the coefficients as 


fy = N2/n, = 4.4094/4.2695 = 1.0328, £; = 1 — £l2 = —0.0328 
and the synthesized mean square is 
—0.0328(23.745) + 1.0328(31.290) = 31.537. 
The degrees of freedom for the synthesized mean square are 


(31.537) 7 
[—0.0328(23.745)]? _ [1.0328(31.290)]2 
a re 


/ 


The F ratio based on the synthesized mean square is F = 556.648/31.537 = 
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17.651 (p < 0.001), which gives essentially the same result as the earlier 
approximate test. 

The estimates of the variance components 07, Ops and o? are obtained as the 
solution to the following simultaneous equations: 


556.648 = 07 + 4.409405 + 12.6795a/, 
31.290 = of + 4.269505, 


and 
23.745 = 02. 


Therefore, the desired estimates are given by 


6? = 23.745, 
31.290 — 23.745 
a2 
= = 1.767, 
p 4.2695 


and 


aD. 556.648 — 23.745 — 4.4094(1.767) 


= 41.414. 
12.6795 


These variance components account for 35.5, 2.6, and 61.9 percent of the total 
variation in the experimental data. The results on variance components estimates 
are consistent with those on tests of hypothesis. It is further evident from this 
analysis that the larger part of the variability in weight gains is attributable to 
sires. The variability between repeated measurements on a given dam is also 
quite large. 


6.14 WORKED EXAMPLE FOR MODEL III 


Snedecor and Cochran (1989, p. 250) reported data from an experiment de- 
signed to evaluate the breeding value of a set of five sires in raising pigs. Each 
sire was mated to two dams randomly selected from a group of dams and aver- 
age daily weight gains of two pigs from each litter were recorded. The data are 
given in Table 6.11. 

The data in Table 6.11 can be regarded as a two-way nested classification 
with dams nested within sires and average daily gains made from two pigs of 
each litter. Here, sires are fixed and dams are random, so we have a Model III 
situation. The mathematical model for the experiment would be 


1=1,2 
Vijk =Mtat+ Buiytenijy 1 J=1,2 (6.14.1) 
ke 1.2 
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TABLE 6.11 
Average Daily Weight Gains of Two Pigs of Each Litter 
Sires 
1 2 3 4 5 
Dams 1 2 1 2 1 2 1 2 1 2 


Weight gains 2.77 2.58 2.28 3.01 2.36 2.72 2.87 2.31 2.74 2.50 
2.38 2.94 2.22 2.61 2.71 2.74 246 2.24 256 2.48 


Dam totals (yi) 5.15 5.52 450 5.62 5.07 546 5.33 4.55 5.30 4.98 
Sire totals (y; ) 10.67 10.12 10.53 9.88 10.28 


Source: Snedecor and Cochran (1989, p. 250). Used with permission. 


where jz is the general mean, a; is the effect of the i-th sire, Bj) is the ef- 
fect of the j-th dam nested within the i-th sire, and e,(;;) is the effect of the 
k-th observation nested within the j-th dam within the i-th sire (error term). 
Furthermore, the a@;’s are fixed with ~ a; =O, and Bjq)’s and ex(j)'S are 
assumed to be independently and normally distributed with mean zero and 
variances of and o;, respectively. 

To analyze the data of Table 6.11 according to model (6.1.1), the sums of 
squares using the computational formulae of Section 6.7 are obtained as 


51.48) 
SSr = (2.77) + (2.38)? +---+(2.48) — BShc 
Sx2 <2 
= 133.5598 — 132.5095 
= 1.0503, 
oe (10.67)? + (10.12)? + ---+ (10.28)? = (51.48)? 
aa 2x2 5x2x2 
= 132.6092 — 132.5095 
= 0.0997, 
(5.15)? + (5.52)? +--+» + (4.98) 
SS B(A) = SSE 
2 
(10.67)? + (10.12)* + --- + (10.28) 
2x2 


= 133.1728 — 132.6092 
= 0.5636, 
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TABLE 6.12 
Analysis of Variance for the Weight Gains Data of Table 6.11 
Source of Degrees of Sumof Mean Expected 
Variation Freedom Squares Square Mean Square FValue p-Value 
5 
Sires 4 0.0997 0.0249 o2+207+4) a7 0.221 0.916 
i=] 
Dams 5 0.5636 0.1127 of +203 2.912 0.071 
(within sires) 
Error 10 0.3870 0.0387 a? 
Total 19 1.0503 
and 


SSp = (2.77) + (2.38 +---+ (2.48) 


(5.15)* + (5.52)? + --- + (4.98) 
y, 
— 133.5598 — 133.1728 


= 0.3870. 


These results together with the remaining calculations are shown in Table 
6.12. The test of the hypothesis H?:0f =0 versus H,?: 0% > 0 gives the vari- 
ance ratio of 2.912 which is less than its 5 percent critical value of 3.33 
(p = 0.071). Similarly, the test of the hypothesis H¢': alla; =O versus Hj*: a; #4 
O for at least one i = 1, 2,...,5 gives the variance ratio of 0.221 which again 
falls substantially below its critical value of 5.19 at the 5 percent level (p = 
0.916). Thus, we may conclude that there is probably no significant effect of 
either sires or dams within sires on average daily weight gains in these data. 
The estimates of variance components o7 and Of are given by 


6° = 0.0387 


and 


1 
5 = 5 (0.1127 — 0.0387) = 0.037, 


The results on variance components estimates are consistent with those on tests 
of hypotheses given previously. 
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6.15 USE OF STATISTICAL COMPUTING PACKAGES 


For balanced nested designs involving only random factors, SAS NESTED is 
the procedure of choice. Although the NESTED procedure performs the F tests 
assuming a completely random model, the computations for sums of squares 
and mean squares remain equally valid under fixed and mixed model analy- 
sis. If some of the factors are crossed or any factor is fixed, PROC ANOVA 
is more appropriate for a fixed effects factorial model with balanced struc- 
ture while GLM is more suited for a random or mixed effects model involv- 
ing balanced or unbalanced data sets. In GLM, random and mixed model 
analyses can be handled via RANDOM and TEST options. For balanced de- 
signs, analysis of variance estimates of variance components are readily ob- 
tained from the output produced by either the NESTED or GLM procedure. 
For other methods of estimation of variance components, PROC MIXED or 
VARCOMP must be used. For instructions regarding SAS commands, see 
Section 11.1. 

Among the SPSS procedures either MANOVA or GLM could be used for 
nested designs involving fixed, random, or mixed effects models. InSPSS GLM, 
the random or mixed effects of analysis of variance is performed by aRANDOM 
subcommand and the hypothesis testing for each effect is automatically carried 
Out against the appropriate error term. In addition, GLM displays expected val- 
ues of all the mean squares which can be used to estimate variance components. 
Furthermore, SPSS Release 7.5 incorporates a new procedure, VARCOMP, es- 
pecially designed to estimate variance components. For instructions regarding 
SPSS commands, see Section 11.2. 

Among the BMDP programs, 3V or 8V can be used for nested designs. 8V is 
especially designed for balanced data sets while 3V analyzes a general mixed 
model including balanced or unbalanced designs. In 3V, the procedures for 
estimating variance components include the restricted maximum likelihood and 
the maximum likelihood estimators. If the estimates obtained via the analysis 
of variance approach are nonnegative, they agree with those obtained using the 
restricted maximum likelihood procedure. The program 2V does not directly 
give the sums of squares for nested factors. However, the cross-factor sums 
of squares could be combined to produce desired sums of squares in a nested 
design. 


6.16 WORKED EXAMPLES USING STATISTICAL PACKAGES 


In this section, we illustrate the application of statistical packages to perform 
two-way nested analysis of variance for the data sets of examples presented 
in Sections 6.11 through 6.14. Figures 6.3 through 6.6 illustrate the program 
instructions and the output results for analyzing data in Tables 6.5, 6.7, 6.9, and 
6.11 using SAS GLM/NESTED, SPSS MANOVA/GLM, and BMDP 3V/8V. 
The typical output provides the data format listed at the top, all cell means, and 
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| DATA CHICKENS; The SAS System 
INPUT TREATMEN PEN WEIGHT; General Linear Models Procedure 
DATALINES; 
1 1 573 Dependent Variable: WEIGHT 
11 1 636 Sum of Mean 
11 1 883 Source DF Squares Square F Value Pr > F 
Hive hee De oa Model 7 179631.58 25661.65 1.04 0.4163 
4 2 657 Error 40 982654. 33 24566. 36 
: Corrected 47 1162285.92 
PROC GLM; Total 
CLASSES TREATMEN PEN; R-Square C.V. Root MSE WEIGHT Mean 
MODEL WEIGHT=TREATMEN 0.154550 22.00455 156.74 712.29 
PEN (TREATMEN) ; 
RUN; Source DF Type I SS Mean Square F Value 
CLASS LEVELS VALUES TREATMEN 3 53943.42 17981.14 0.73 
TREATMEN 4 1234 PEN (TREATMEN ) 4 125688.17 31422.04 1.28 
PEN 2 12 Source DF Type III SS Mean Square F Value 
0 
1 


| NUMBER OF OBS. IN DATA TREATMEN 3 53943.42 17981.14 73 


SET=48 PEN (TREATMEN) 4 125688.17 31422.04 . 


(i) SAS application: SAS GLM instructions and output for the two-way fixed effects 
nested analysis of variance. 


28 


DATA LIST Analysis of Variance-Design 1 
/TREATMEN 1 PEN 3 
WEIGHT 5-8. Tests of Significance for WEIGHT using UNIQUE sums of squares 
BEGIN DATA. 
Source of Variation Ss MS F Sig of F 


WITHIN CELLS 982654. 
PEN WITHIN TREATMEN 125688. -294 
. TREATMEN 53943. . -539 
MANOVA WEIGHT BY 
TREATMEN (1, 4) (Model) 179631. . -416 
PEN (1,2) (Total) 1162285. 
| /DESIGN=PEN WITHIN 
TREATMEN VS WITHIN R-Squared = -155 
TREATMENT VS WITHIN. Adjusted R-Squared = .007 


(ii) SPSS application: SPSS MANOVA instructions and output for the two-way fixed 
effects nested analysis of variance. 


/ INPUT FILE='C: \SAHAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\TEXTO\EJEL5 . TXT’. - EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 

| VARIABLES=6. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 

1} /VARIABLE NAMES=C1,C2,C3, 

C4,C5,C6. SOURCE ERROR SUM OF  D.F. MEAN 

4 /DESIGN NAMES=T,P,C. TERM SQUARES SQUARE 

LEVELS=4, 2, 6. 1 MEAN C(TP) 24353252. 24353252. 
RANDOM=C. 2 TREATMNT C(TP) 53943. 
FIXED=T, P. 3 P(T) C(TP) 125688. : 1.28 0.2943 
MODEL='T,P(T),C(P)’. | 4 C(TP) 982654. 


SOURCE EXPECTED MEAN ESTIMATES OF 
SQUARE VARIANCE COMPONENTS 


ANALYSIS OF VARIANCE DESIGN 1 MEAN 48(1)+(4) 506847.61927 
INDEX T PC 2 TREATMNT 12(2)+(4) -548.76829 
| NUMBER OF LEVELS 4 2 6 3 P(T) 6(3)+(4) 1142.61389 


# POPULATION SIZE 4 2 INF 4 C(TP) (4) 24566. 35833 
MODEL T, P(T), C(P) 


(iii) BMDP application: BMDP 8V instructions and output for the two-way fixed effects 
nested analysis of variance. 


FIGURE 6.3. Program Instructions and Output for the Two-Way Fixed Effects 
Nested Analysis of Variance: Weight Gains Data for Example of Section 6.11 
(Table 6.5). 
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DATA MOISTURE; The SAS System 
INPUT BATCHES SAMPLE Coefficients of Expected Mean Squares 
MOISTURE; Source BATCHES SAMPLE ERROR 
fs DATALINES; BATCHES 4 2 1 
1 40 SAMPLE 0 2 1 
ERROR 0 0 1 
Variance Degres of Sum of 
Source Freedom Squares F Value Pr > F 
TOTAL 59 2108.183333 
BATCHES 14 1210. 933333 1.492 0.2256 
SAMPLE 15 869.750000 63.255 0.0000 
ERROR 30 27.500000 
Variance Variance Percent 
Source Mean Square Component of Total 
TOTAL 35.731921 36.577976 100.0000 
BATCHES 86.495238 7.127976 19.4871 
PROC NESTED; SAMPLE 57.983333 28.533333 78.0069 
CLASSES BATCHES SAMPLE; ERROR 0.916667 0.916667 2.5061 
Mean 26. 78333333 
Standard error of mean 1.20066119 


RPNONRPRNN 


ie 
1 
1 
1 
2 
2 
2 
1 2 
1 3 


(i) SAS application: SAS NESTED instructions and output for the two-way random 
effects nested analysis of variance. 


DATA LIST Tests of Between-Subjects Effects Dependent Variable: MOISTURE 
/BATCHES 1-2 
SAMPLE 4 Source Type III SS af Mean Square F Sig. 
MOISTURE 6-7. BATCHES Hypothesis 1210.933 14 86.495 1.492 .226 
BEGIN DATA. Error 869.750 15 57.983 (a) 
40 - SAMPLE Hypothesis 869.750 15 57.983 63.255 .000 
39 (BACHES) Error 27.500 30 -917 (b) 
30 a MS(SAMPLE(BATCHES)) b MS(Error) 
30 
26 Expected Mean Squares (a,b) 
Variance Component 
Source Var (BATCHES) Var (SAMPLE (BATCHES) ) Var (Error) 
END DATA. BATCHES 4.000 2.000 1.000 
GLM MOISTURE BY SAMPLE (BATCHES) .000 2.000 1.000 
BATCHES SAMPLE Error .000 .000 1.000 
| /DESIGN BATCHES a For each source, the expected mean square equals the sum of the 
) SAMPLE (BATCHES) coefficients in the cells times the variance components, plus a 
/RANDOM BATCHES quadratic term involving effects in the Quadratic Term cell. b Expected 
SAMPLE. Mean Squares are based on the Type III Sums of Squares. 


(11) SPSS application: SPSS GLM instructions and output for the two-way random 
effects nested analysis of variance. 


FILE='C: \SAHAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\TEXTO\EJE16.TXT’. ~ EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 

VARIABLES=2. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 
NAMES=A1,A2. 

NAMES=B,5S,A. SOURCE ERROR SUM OF D.F. MEAN F PROB. 
LEVELS=15, 2,2. TERM SQUARES SQUARE 

RANDOM=B,5,A. 1 MEAN BATCH 43040.81667 1 43040.817 497.61 0.0000 
MODEL='B,S(B),A(S)’. 2 BATCH S (B) 1210.93333 14 86.495 1.49 0.2256 


3 S(B) A(BS) 869.75000 15 57.983 63.25 0.0000 
4 A(BS) 27.50000 30 0.917 


SOURCE EXPECTED MEAN ESTIMATES OF 
ANALYSIS OF VARIANCE DESIGN SQUARE VARIANCE COMPONENTS 
INDEX BS A 1 MEAN 60 (1) +4 (2) +2(3)+(4) 715.90536 
NUMBER OF LEVELS 15 2 2 2 BATCH 4(2)+2(3)+(4) 7.12798 
POPULATION SIZE INF INF INF 3 S(B) 2(3)+(4) 28 .53333 
MODEL B, S(B), A(S) 4 A(BS) (4) 0.91667 


(111) BMDP application: BMDP 8V instructions and output for the two-way random 
effects nested analysis of variance. 


FIGURE 6.4 Program Instructions and Output for the Two-Way Random Effects 
Nested Analysis of Variance: Moisture Content of Pigment Paste Data for Example 
of Section 6.12 (Table 6.7). 
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DATA BREEDING; The SAS System 
INPUT SIRE DAM WEIGHT; General Linear Models ‘Procedure 


DATALINES; Dependent Variable: WEIGHT 
Sum of Mean 
Source DF Squares Square F Value Pro>F 
Model 11 1920.26007 174.56910 7.35 0.0001 
Error 40 949.79762 23.74494 
Corrected 51 2870.05769 
Total 
R-Square C.V. Root MSE WEIGHT Mean 
0.669067 17.80672 4.87288 27.3654 
Source DF Type I SS Mean Square F Value Pr > F 
SIRE 1669.94341 556.64780 23.44 0.0001 
DAM (SIRE) 250.31667 31.28958 1.32 0.2628 
Source Type III SS Mean Square F Value Pr > F 
SIRE 1594.12974 531.37658 22.38 0.0001 
DAM (STRE) 250.31667 31.28958 1.32 0.2628 
Source Type III Expected Mean Square 
‘ SIRE Var (Error) +4.1311 Var (DAM(SIRE))+12.26 Var(SIRE) 
i PROC GLM; DAM (SIRE) Var (Error)+4.2695 Var (DAM(SIRE) ) 
CLASSES SIRE DAM; 
MODEL WEIGHT = SIRE Tests of Hypotheses for Random Model Analysis of Variance 
y DAM (SIRE) 7 Source: SIRE Error: 0.9676*MS(DAM(SIRE)) + 0.0324*MS (Error) 
1 RANDOM SIRE DAM(SIRE)/TEST; Denominator Denominator 
; DF Type III MS DF MS F Value Pr > F 
LEVELS VALUES 3 531.37658135 8.41 31.045049505 17.1163 0.0006 
1234 Source: DAM(SIRE)Error: MS(Error) 
1234 Denominator Denominator 
. IN DATA DF Type III MS DF MS F Value Pr > F 
8 31.289583333 40 23.744940476 1.3177 0.2628 


dk 
11 
11 
11 
12 
12 
12 
2 
1 2 
1 3 
1 3 
1 3 
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(i) SAS application: SAS GLM instructions and output for the two-way random effects 
nested analysis of variance with unequal numbers in the subclasses. 


# DATA LIST Tests of Between-Subjects Effects Dependent Variable: WEIGHT 
/SIRE 1 DAM 3 
WEIGHT 5-6. Source Type III Ss df Mean Square FE Sig. 
BEGIN DATA. SIRE Hypothesis 1594.130 3 531.377 17.116 .001 
Error 261.114 8.411 31.045 (a) 
DAM (SIRE) Hypothesis 250.317 8 31.290 1.318 
Error 949.798 40 23.745 (b) 
a .968 MS(D(S))+3.241E-02 MS(E) b MS(Error) 


Expected Mean Squares (a,b) 
Variance Component 
a Te Source Var (SIRE) Var (DAM (SIRE) ) Var (Error) 

74 3 SIRE 12.260 4.131 1.000 
HEND DATA. DAM (SIRE) -000 4.269 1.000 

GLM WEIGHT BY Error -000 .000 1.000 

SIRE DAM a For each source, the expected mean square equals the sum of the | 
/DESIGN SIRE coefficients in the cells times the variance components, plus 

DAM (SIRE) 
| /RANDOM SIRE DAM. 


(i1) SPSS application: SPSS GLM instructions and output for the two-way random 
effects nested analysis of variance with unequal numbers in the subclasses. 


FIGURE 6.5 Program Instructions and Output for the Two-Way Random Effects 
Nested Analysis of Variance with Unequal Numbers in the Subclasses: Breeding 
Data for Example of Section 6.13 (Table 6.9). 
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FILE='C: \SAHAI BMDP3V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 


\TEXTO\EJE17.TXT’. Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 


VARIABLES=3. DEPENDENT VARIABLE WEIGHT 

/VARIABLE NAMES=SIRE, DAM, WEIGHT. 

/GROUP CODES (SIRE)=1,2,3,4. PARAMETER ESTIMATE STANDARD EST/. TWO-TAIL PROB. 
NAMES (SIRE) =S1,S2,S3,S4. ERROR ST.DEV (ASYM. THEORY) 
CODES (DAM) =1,2,3,4. ERR.VAR. 23.480 5.184 

! NAMES (DAM)=D1,D2,D3,D4. CONSTANT 26.441 3.477 7.603 0.000 

| /DESIGN DEPENDENT=WEIGHT. SIRE 45.601 39.793 


RANDOM=SIRE. DAM (SIRE 2.135 3.779 

RANDOM=DAM, SIRE. 

RNAMES=S, 'D(S)'. TESTS OF FIXED EFFECTS BASED ON ASYMPTOTIC VARIANCE 
METHOD=REML. -COVARIANCE MATRIX 


SOURCE F-STATISTIC DEGREES OF PROBABILITY f 
FREEDOM 
CONSTANT 57.81 1 51 0.00000 


(iii) BMDP application: BMDP 3V instructions and output for the two-way random 
effects nested analysis of variance with unequal numbers in the subclasses. 


FIGURE 6.5 (continued) 


——— 


Bee LITTER; The SAS System 
| INPUT SIRE DAM WEIGHT; General Linear Models Procedure 

DATALINES; Dependent Variable: WEIGHT 

112.77 Sum of Mean 
#1 1 2.38 Source DF Squares Square F Value Pr > F 
H1 2 2.58 Model 9 0. 66328000 0.07369778 1.90 0.1649 
1 2 2.94 Error 10 0.38700000 0.03870000 

er Corrected 19 1.05028000 

5 2 2.48 Total 
; R-Square c.V. Root MSE WEIGHT Mean 
PROC GLM; 0.631527 7.6427022 0.19672316 2.574000 
CLASSES SIRE DAM; Source DF Type III SS Mean Square F Value Pr>F 
| MODEL WEIGHT=SIRE DAM(SIRE); | SIRE 4 0.09973000 0.02493250 0.64 0.6433 
| RANDOM DAM(SIRE) ; DAM (SIRE) 5 0.56355000 0.11271000 2.91 0.0707 
TEST H=SIRE E=DAM(SIRE); Source Type III Expected Mean Square 

RUN; SIRE Var (Error) + 2 Var(DAM(SIRE)) + Q(SIRE) 

CLASS LEVELS VALUES DAM (SIRE) Var(Error) + 2 Var(DAM(SIRE) ) 

5 12345 Tests of Hypotheses using the Type III MS for DAM(SIRE) as 
2 12 an error term 
NUMBER OF OBS. IN DATA Source DF Type III SS Mean Square F Value Pr>F 
| SET=20 SIRE 4 0.09973000 0.02493250 0.22 0.9155 


(i) SAS application: SAS GLM instructions and output for the two-way mixed effects 
nested analysis of variance. 


FIGURE 6.6 Program Instructions and Output for the Two-Way Mixed Effects 
Nested Analysis of Variance: Average Daily Weight Gains Data for Example of 
Section 6.14 (Table 6.11). 
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Tests of Between-Subjects Effects Dependent Variable: WEIGHT 


WEIGHT*5-8(2) | Source Type III SS df Mean Square F Sig. 
BEGIN DATA. SIRE Hypothesis 9.973E-02 -221 .916 
77 Error -564 
.38 DAM (SIRE) Hypothesis -564 2.912 .071 
.58 Error 387 10 
"94 a MS(DAM(SIRE)) b MS(Error) 
.28 
.22 Expected Mean Squares (a,b) 
.01 Variance Component 
; Source Var (DAM(SIRE) ) Var (Error) Quadratic Term 
-48 SIRE 2.000 1.000 Sire 
DATA. DAM (SIRE) 2.000 1.000 
WEIGHT BY Error .-000 1.000 
a For each source, the expected mean square equals the sum of the 
/DESIGN SIRE coefficients in the cells times the variance components, plus a quadraticf 
DAM (SIRE) term involving effects in the Quadratic Term cell. b Expected Mean Squares | 
/RANDOM SIRE. are based on the Type III Sums of Squares. 


| 


(ii) SPSS application: SPSS GLM instructions and output for the two-way mixed effects 
nested analysis of variance. 


| /INPUT | FILE='C:\SAHAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\TEXTO\EJE18.TXT'. - EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 
VARIABLES=2. 
/VARIABLE NAMES=PIG1, PIG2. 
|/DESIGN NAMES=SIRE, DAM, PIG. 
LEVELS=5, 2, 2. 
_ RANDOM=DAM, PIG. 
FIXED=SIRE. 
MODEL='S,D(S),P(D)'. 


ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 


SOURCE ERROR SUM OF D.F. MEAN F PROB. 
TERM SQUARES SQUARE 


MEAN D(S) 132.509519 1 132.509519 1175.67 0.0000 
SIRE D(S) 0.099730 4 0.024933 0.22 0.9155 
D(S) P(SD) 0.563550 5 0.112710 2.91 0.0707 
P(SD) 0.387000 10 0.038700 


&m WN 


SOURCE 


EXPECTED MEAN ESTIMATES OF VARIANCE 


SQUARE COMPONENTS 
1 MEAN 20 (1)+2 (3)+ (4) 6.61984 
NUMBER OF LEVELS 5 2 2 2 SIRE 4(2)+2(3)+(4) -0.02194 
POPULATION SIZE 5 INF INF 3 D(S) 2 (3)+(4) 0.03700 
MODEL S, D(S), P(D) 4 P(SD) (4) 0.03870 


(iii) BMDP application: BMDP 8V instructions and output for the two-way mixed 
effects nested analysis of variance. 


FIGURE 6.6 (continued) 


the entries of the analysis of variance table. It should be noticed that in each 
case the results are the same as those provided using manual computations in 
Sections 6.11 through 6.14. However, note that in an unbalanced design, certain 
tests of significance may differ from one program to the other since they use 
different types of sums of squares. 


EXERCISES 


1. An experiment was designed to study the ignition rate of dynamite 
from three different explosive-forming processes. Four types of dy- 
namite were randomly selected from each explosive-forming process 
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and three measurements of ignition rate were made on each type. The 
data in certain standard units are given as follows. 


Explosive 
Process 1 2 3 


Dynamite 1 2 3 4 1 2 3 4 1 2 3 4 


Type 
28.1 23.0 18.3 18.1 23.1 26.3 21.2 38.0 17.1 38.1 41.1 28.0 


32.3 31.1 20.6 19.0 20.2 27.7 246 30.0 18.0 24.5 37.5 32.3 
29.5 23.5 175 16.6 17.6 24.8 20.3 28.0 23.7 27.3 53.6 365 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in ignition rates among the 
three explosive-forming processes. Use a = 0.05. 

(d) Test whether there are differences in ignition rates among dyna- 
mite types within the explosive processes. Use a = 0.05. 

(ec) Estimate the variance components of the model and determine 
95 percent confidence intervals for them. 

2. A manufacturing company wishes to study the tensile strength of 
yarns produced on four different looms. An experiment was designed 
wherein 12 machinists were selected at random and each loom was 
run by three different machinists and two specimens from each ma- 
chinist were obtained and tested. The data in certain standard units 
are given as follows. 


Loom 1 2 3 4 
Machinist 1 2 3 1 2 3 1 2 3 1 2 3 


38.2 53.55 15.3 61.3 41.5 35.3) 47.1 22.5 14.7 15.5 19.3 21.6 
21.6 51.5 26.7 58.3 38.5 27.3 34.3 25.7 26.3 32.3 35.7 26.5 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in the tensile strength among 
the four looms. Use a = 0.05. 

(d) Test whether there are differences in the tensile strength among 
machinists within looms. Use aw = 0.05. 

(e) Estimate the variance components of the model. 

3. A manufacturing company wishes to study the material variability of 
a particular product being manufactured on three different machines. 
Each machine operates in two shifts and four samples are randomly 
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chosen from each shift. The data in certain standard units are given 
as follows. 


Machine 1 2 3 
Shift 1 2 1 2 1 2 


23.5 19.3 25.1 23.55 25.0 27.3 
20.7 20.5 265 21.33 195 26.5 
22.9 21.3 247 22.6 23.4 263 
23.3 19.7 25.3 24.7 22.3 25.8 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in the material variability 
among the machines. Use a = 0.05. 

(d) Test whether there are differences in the material variability be- 
tween the shifts within the machines. Use a = 0.05. 


. An industrial firm wishes to streamline production scheduling by 


assigning one time standard to a particular class of machines. An 
experiment was designed wherein three machines are randomly se- 
lected and each machine is assigned to a different group of three 
operators selected at random. Each operator uses the machine three 
times at different periods during a given week. The data in certain 
standard units are given as follows. 


Machine 1 2 3 
Operator 1 2 3 1 2 3 1 2 3 


103.2 104.1 103.8 99.5 102.6 99.7 107.4 106.0 105.4 
104.3. 104.6 102.7 99.8 101.7 101.2 107.6 103.0 104.4 
105.1 103.7 101.5 98.7 103.5 101.7 108.1 104.2 103.7 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in completion time among 
machines. Use a = 0.05. 

(d) Test whether there are differences in completion time betweem 
operators within machines. Use a = 0.05. 

(e) Estimate the variance components of the model and determine 
95 percent confidence intervals for them. 


. A public health official wishes to test the difference in mean fluoride 


concentration of water in a community. An experiment was designed 
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wherein three water samples were taken from each of three sources of 
water supply and three determinations of fluoride content were per- 
formed on each of the nine samples. The data in milligrams fluoride 
per liter of water are given as follows. 


Supply 1 2 3 
Sample 1 2 3 1 2 3 1 2 3 


17 619 18 19 #19 20 24 27 28 
18 17 #16 20 2.1 18 26 26 25 
19 618 0634.7 2.2 22 19 25 #28 £2.46 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test the hypothesis that there is no difference in mean fluoride 
content between samples at a source of water supply. Use a = 
0.05. 

(d) Test the hypothesis that there is no difference in mean fluoride 
concentration between the sources of water supply. Use a = 
0.05. 

6. A nutritional scientist wishes to test the difference in protein levels in 
animals fed on different dietary regimens. An experiment is designed 
wherein four animals of a certain species are subjected to each of two 
dietary regimens and three samples of blood are drawn from each 
animal to determine the protein content (in mg/100 ml blood). The 
data are given as follows. 


Dietary 
Regimen 1 2 


Animal 1 2 3 4 1 2 3 4 


3.44 351 3.54 352 4.88 491 4.79 4.95 
3.46 3.50 3.52 353 4.84 489 481 4.93 
3.47 347 3.59 357 485 487 482 4.92 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test the hypothesis that there are no differences in mean protein 
levels between animals fed on a given dietary regimen. Use a = 
0.05. 

(d) Test the hypothesis that there are no differences in mean protein 
levels between the two dietary regimens. Use a = 0.05. 

7. An experiment is designed involving a two-stage nested design with 
the levels of factor B nested within the levels of factor A. The relevant 
data are given as follows. 
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44.3 41.0 44.3 41.0 38.7 42.1 
44.3 43.2 44.3 42.1 38.7 42.1 
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Factor A 1 2 3 
Factor B 1 2 1 2 3 1 2 


12,2- 3.3- 2 8.3 7.0 13.2 5.7 
10.1 73 133 102 63 93 6.2 
14.2 15.1 9.1 3.5 10.1 

12.5 


(a) Describe the model and the assumptions for the experiments. It 
is assumed that both factors A and B are fixed. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in the levels of factor A. Use 
a = 0.05. 

(d) Test whether there are differences in the levels of factor B within 
A. Use a = 0.05. 


. A study is conducted to investigate whether a batch of material is 


homogenous by randomly sampling the material in five different 
vats from the large number of vats produced. Three bags are chosen 
at random from each vat. Finally, two independent analyses are made 
on each sample to determine the percentage of a particular substance. 
The data in certain standard units are given as follows. 


39.9 45.4 46.5 38.7 38.7 46.5 36.5 36.5 38.9 
38.7 44.3 45.4 37.6 39.9 46.5 36.5 35.4 39.8 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test the hypothesis that there is no difference in mean percentage 
values between bags within a vat. Use a = 0.05. 

(d) Test the hypothesis that there is no difference in mean percentage 
values between vats. Use a = 0.05. 

(e) Estimate the variance components of the model and determine 
95 percent confidence intervals on them. 


. Hicks (1956) reported the results of an experiment involving strain 


measurements on each of four seals made on each of the four heads 
of each of the five sealing machines. The coded raw data are given 
as follows. 
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Machine 1 2 3 4 5 


Head 12 3 41 23441 2341 23441 23 «4 


6 13 1 7 10 2 4 0 0 10 8 7 11 5 10 16 3 3 
2 3 100 4 9 1 1 3 0 11 5 2 0 10 8 8 470 +7 
0 9 07 7 174 5 60 5 6 9 670 2 4 
8 8 6 9 12 10 9 15 77 4 #4 459 3 2 0 


Source: Hicks (1956, p. 14). Used with permission. 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test the hypothesis that there is no difference in mean strain 
values between the heads within a machine. Use a = 0.05. 

(d) Test the hypothesis that there is no difference in mean strain 
values between the machines. Use a = 0.05. 

10. Sokal and Rohlf (1995, p. 294) reported data from an experiment 
designed to investigate variation in the blood pH of female mice. 
The experiment was carried out on 15 dams that were mated over a 
period of time with either two or three sires. Each sire was mated 
to different dams and measurements were made on the blood pH 
reading of a female offspring. The following data refer to a subset 
of 5 dams which have been randomly selected from 15 dams in the 


experiment. 
Dam 1 2 3 4 5 
Sire 1 2 1 2 1 2 3 1 2 1 2 3 
pH 748 748 7.38 7.37 7.41 7.47 7.53 7.39 7.50 7.39 7.43 7.46 


Reading 7.48 7.53 7.48 7.31 7.42 7.36 7.40 7.31 744 7.37 7.38 7.44 
7.52 743 746 745 7.36 743 744 7.30 740 7.33 7.44 7.37 
7.54 7.39 7.41 7.47 738 740 7.41 7.45 7.43 7.54 


Source: Sokal and Rohlf (1995, p. 294). Used with permission. 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in mean pH readings between 
dams. Use a = 0.05. 

(d) Test whether there are differences in mean pH readings between 
sires within dams. Use a = 0.05. 

(e) Estimate the variance components of the model. 

11. Marcuse (1949) reported results from an experiment designed to 1n- 
vestigate the moisture content of cheese. The experiment was 
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conducted by sampling three different lots from the large number 
of lots produced. Two samples of cheese were selected at random 
from each lot. Finally, two subsamples per sample were chosen and 
independent analyses made on each subsample to determine the per- 
centage of moisture content. The data are given as follows. 


Lot 1 2 3 
Sample 1 2 1 2 1 2 


39.02 38.96 35.74 35.58 37.02 35.70 
38.79 39.01 35.41 35.52 36.00 36.04 


Source: Marcuse (1949). Used with permission. 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test the hypothesis that there are no differences in mean percent- 
age values of the moisture content between samples within lots. 
Use a = 0.05. 

(d) Test the hypothesis that there are no differences in mean percent- 
age values of the moisture content between lots. Use a = 0.05. 

(e) Estimate the variance components of the model and determine 
95 percent confidence intervals on them. 

12. Sokal and Rohlf (1995, p. 276) reported data from a biological ex- 
periment involving 12 female mosquito pupae. The mosquitos were 
randomly assigned into three rearing cages with each cage receiving 
4 pupae. The reported responses are independent measurements of 
left wings of the mosquito and the data are given as follows. 


Cage 1 2 3 
Mosquito 1 2 3 4 1 2 3 4 1 2 3 4 


58.5 77.8 84.0 70.1 69.8 56.0 50.7 63.8 566 77.8 69.9 62.1 
59.5 80.9 83.6 683 698 545 49.3 65.8 57.5 79.2 69.2 64.5 


Source: Sokal and Rohlf (1995, p. 276). Used with permission. 


(a) Describe the model and assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Testthe hypothesis that there are no differences in mean measure- 
ment values between mosquitos within a cage. Use a = 0.05. 

(d) Test the hypothesis that there are no differences in mean mea- 
surement values between cages. Use a = 0.05. 

(e) Estimate the variance components of the model and determine 
95 percent confidence intervals on them. 
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13. Steel and Torrie (1980, p. 154) reported data from a greenhouse ex- 
periment that examined the growth of mint plants. A large group of 
plants were assigned at random to pots with each pot receiving four 
plants. Treatments were randomly assigned to pots with each treat- 
ment receiving three pots. There were 6 fixed treatments representing 
combinations of cross factors with 3 levels of hours of daylight and 2 
levels of temperatures. Observations were made on individual plants 
where the response variable was the one week stem growth of the 
mint plant. The data are given as follows. 


Treatment* 


Pot 1 


Treatment* 


Pot 1 


8.5 
6.0 
9.0 
8.5 


Source: Steel and Torrie (1990, p. 


6.5 
7.0 
8.0 
6.5 


7.0 
7.0 
7.0 
7.0 


6.0 
5.5 
3.5 
7.0 


6.0 
8.5 
4.5 
7.5 


154). Used with permission. 


6.5 
6.5 
8.5 
a3 


7.0 
9.0 
8.5 
8.5 


6.0 
7.0 
7.0 
7.0 


11.0 
7.0 
9.0 
8.0 


* Treatments representing combinations of hours of daylight 


and temperatures are defined as follows: 


(a) Describe the model and assumptions for the experiment. 
(b) Analyze the data and report the analysis of variance table. 


Temperature 


Low 
High 


8 


I 
4 


12 


2 
5 


Hours of Daylight 


16 


3 
6 


(c) Test the hypothesis that there are no differences in mean stem 


growths between pots within a treatment. Use a = 0.05. 


(d) Test the hypothesis that there are no differences in mean stem 


growths between the treatments. Use a = 0.05. 


(e) Estimate the variance components of the model and determine 
95 percent confidence intervals on them. 
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(f) Estimate the contrast representing the difference between two 
treatments defined as the low temperature—8 hours and high 
temperature—8 hours and set a 95 percent confidence interval. 

(g) Estimate the contrast defined as: Low—8 + Low-12 + Low-— 
16 — High— 8 — High-12 — High—16 and set a 95 percent con- 
fidence interval. 

14. Sokal and Rohlf (1994, p. 364) reported data from an experiment 
designed to investigate the effects of breed and maturity of pure-bred 
cows on butterfat content. Five breeds of pure-bred dairy cattle were 
taken from Canadian records and random samples of 10 mature (>5 
years old) and 10 two-year-old cows were selected from each of five 
breeds. The following data give average butterfat percentages for 
each cow. 


Breed Ayrshire Canadian Guernsey _ Holstein-Fresian Jersey 


Cow Mature 2-yr Mature 2-yr Mature 2-yr Mature = 2-yr Mature 2-yr 


3.74 4.44 3.92 4.29 4.54 5.30 3.40 3.79 480 5.75 
4.01 4.37 4.95 5.24 5.18 4.50 3.55 3.66 6.45 5.14 
3.77 4.25 447 4.43 5.75 4.59 3.83 3.58 5.18 5.25 
3.78 3.71 4.28 4.00 5.04 5.04 3.95 3.38 449 4.76 
410 408 4.07 4.62 4.64 4.83 4.43 3.71 5.24 5.18 
406 3.90 4.10 4.29 4.79 4.55 3.70 3.94 5.70 4.22 
4.27 4.41 4.38 4.85 4.72 4.97 3.30 3.59 5.41 5.98 
3.94 4.11 3.98 4.66 3.88 5.38 3.93 3.55 4.77 4.85 
4.11 4.37 446 4.40 5.28 5.39 3.58 3.55 5.18 6.55 
4.25 3.53 5.05 4.33 4.66 5.97 3.54 3.43 5.23. 5.72 


Source: Sokal and Rohlf (1994, p. 364). Used with permission. 


(a) Describe the model and the assumptions for the experiment. 
Would you use Model I, Model II, or Model III. Explain. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test the hypothesis that there are no differences in average per- 
centage values of the butterfat content between mature and two- 
year old cows. Use a = 0.05. 

(d) Test the hypothesis that there are no differences in average per- 
centage values of the butterfat content between the breeds. Use 
a = 0.05. 

(e) If you assumed any of the factors to be random, estimate the 
variance components of the model and construct 95 percent con- 
fidence intervals on them. 


7 Three-Way 
| and Higher-Order 
Nested Classifications 


7.0 PREVIEW 


The results of the preceding chapter can be readily extended to the case of 
three-way and the general g-way nested or hierarchical classifications. As an 
example of a three-way nested classification, suppose a chemical company 
wishes to examine the strength of a certain liquid chemical. The chemical is 
made in large vats and then is barreled. To study the strength of the chemical, 
an analyst randomly selects three different vats of the product. Three barrels are 
selected at random from each vat and then three samples are taken from each 
barrel. Finally, two independent measurements are made on each sample. The 
physical layout can be depicted schematically as shown in Figure 7.1. In this 
experiment, barrels are nested within the levels of the factor vats and samples 
are nested within the levels of the factor barrels. This 1s the so-called three-way 
nested classification having two replicates or measurements. In this chapter, we 
consider the three-way nested classification and indicate its generalization to 
higher-order nested classifications. 


7.1 MATHEMATICAL MODEL 


Consider three factors A, B, and C having a, b, and c levels respectively. The 
b levels of factor B are nested under each level of A and c levels of factor C are 
nested under each level of factor B (within A), and there are n replicates within 
the combination of levels of A, B, and C. The analysis of variance model for 
this type of experimental layout is taken as 


i 
Vijke =U +Qi+ Bi) + YVeGj) + Ceajk) : 
£ 


where yz is the general mean, a; is the effect due to the i-th level of factor A, 
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FIGURE 7.1 A Layout for the Three-Way Nested Design Where Barrels Are Nested 
within Vats and Samples Are Nested within Barrels. 


Bj) 1s the effect due to the j-th level of factor B within the i-th level of factor 
A, Yeujy 18 the effect due to the k-th level of factor C within the j-th level of 
factor B and the i-th level of factor A, and é¢(;;,) 1s the error term that represents 
the variation within each cell. 

When all the factors have systematic effects, Model I is applicable to the data 
in (7.1.1). When all the factors are random, Model II is appropriate; and when 
some factors are fixed and others are random, a mixed model or Model III is the 
appropriate one. The assumptions under Models I, II, and III are exact parallels 
to that of the model (6.1.1). For example, 1f we assume that A is fixed and B 
and C are random, then a@;’s are unknown fixed constants with the restriction 
that 4 a; = 0, and Bjciy’s, Vecij)’S, aNd ee jx)’S are mutually and completely 
uncorrelated random variables with zero means and variances o2, a and 0? 
respectively. 


7.2 ANALYSIS OF VARIANCE 


The calculations of the sums of squares and the analysis of variance for the 
three-way nested design are similar to the analysis for the two-way nested 
design presented in Chapter 6. The formulae for the sums of squares together 
with their computational forms are simple extensions of the formulae for the 
two-way nested design given in Section 6.7. Thus, starting with the identity 


Yijne — Yi... = Vi — Wud + iz... — Vid + Wijk. — Viz.) + Vijce — Viaje.) 
the total sum of squares is partitioned as 


SSr = SS4 + SSacay + SSccay + SSe, 
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where 


a b Cc n a b Cc n 9) 


SS;r = ae > (Vijne — ¥...)° = > By Vine — ua 


f=) j=l k=l £=1 i=1 j= 


a _ = 1 a y 
SS4 = ben Y(5i,, — 5." = — Do yp - >, 
c L 


| el 
Ne nen 
ll 
pena 
ir 
ema 


a 


b a 
SS 3A) =n) > ij, — Vi. ah ~~ — yoy? 
i=] 


i=l] j=l a= 
a bee 1 


b 
SSc(a) =n > > SY Wiik. — Fix = i »> 


(==1. J=1 -k=1 i=1 J=1 k=1 f=1.-g=1 


and 
a b c n a b c n l a b c 
2 
SS- = > (Yijke — Viney = = yo, Vij ike >» Yijk.» 
i=l j=l k=) C=} ml gal k=l f= 1 a i=l. j=t kal 
with 
n 
ijk. = ) Vijkes Vijk. = Vijk./N, 
t=1 
Cc n 
ij. = S S Vijkes Vij. = Yij../en, 
k=1 @=1 
n 


Cc 
2 Vijkes Yi... = Yi.../ben, 
k=l 


a b 
y= Bye Yijkey and y= y../aben. 


It should be noticed that the nested sums of squares are related to the sums 
of squares for main effects and interactions, considering all the factors being 
crossed, as follows: 


SSacay = SSp+SSaz, SScwey = SSc + SSac + SSac + SSazc- 


The expected mean squares can be obtained by proceeding directly as earlier 
or using the general rules for obtaining the expected mean squares. The resultant 
analysis of variance is summarized in Table 7.1. There are various types of mixed 
models that may arise. The analysis of variance table contains the expectations 
of mean squares for the case when A and B are fixed and C is random. If we 
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assume a mixed model with factor A having a systematic effect and factors B 
and C having random effects, then analysis is the same as under Model II except 
that 0 is now replaced by )“7_, @?/(a — 1). The remaining four cases that fall 
under the mixed model (1.e., those in which factors B and C have opposite 
effects) are left as an exercise. It should, however, be pointed out that the term 
involving 8j(;) disappears from the expectation of MS, when factor B has a 
systematic effect, and the term involving y,(;;) disappears from the expectations 
of MS, and MSz,4) when factor C has a systematic effect. Thus, special care 
is needed when determining the appropriate tests to be made. 


7.3. TESTS OF HYPOTHESES AND ESTIMATION 


Under the assumption of normality, the four mean squares are independently 
distributed as multiples of a chi-square variable such that the ratio of any two 
mean squares is distributed as the variance ratio F’.! The expected mean square 
column of Table 7.1 suggests the proper test statistics to be employed for 
testing the particular hypotheses of interest. Thus, under Model I, F tests for 
the effects of all three factors can be performed by dividing the corresponding 
mean squares by MSzg. Under Model I], tests for the existence of the main 
effects of factor A and the two nested factors B and C all exist. The factor 
A effect 1s tested by means of the ratio MS4/MSa,4), factor B by the ratio 
MSacay/MScva), and factor C by the ratio MSc,a)/MS<¢.? Under Model III, 
with factor A having a systematic effect and factors B and C having random 
effects, the tests for all the factor effects would be the same as indicated under 
Model II. The tests for other variations of mixed models are obtained similarly. 

The variance components for various model factors are readily estimated 
by using the customary analysis of variance procedure. For example, under 
Model II, the desired estimators are: 


BS — MSe, 
3 = (MScia) — MSz)/n, 
53 = (MSa(4) — MSccay)/cn, (7.3.1) 


and 
52 = (MS, — MSa:ay)/ben. 


The estimators (7.3.1) are the so-called best unbiased estimators as discussed 
before; but the estimates for a7, OB, and o, can be negative. It should be 


| For a proof of this result see Scheffé (1959, pp. 251-254). 
These are all exact F tests and their power is readily expressed in terms of the (central) F 
distribution. 


400 The Analysis of Variance 


noticed that the estimation of variance components is especially simple for 
hierarchically nested designs. One simply obtains the difference between the 
mean squares for the factor involving the variance component of interest and 
the one following it; and the resulting difference is divided by the coefficient 
of the variance component in the expected mean square. An exact confidence 
interval for o2 can be constructed by noting that abc(n — 1)MS_/o? has a 
chi-square distribution with abc(n — 1) degrees of freedom.’ 

Under Model III, with A fixed and B and C random, exact confidence inter- 
vals on means, 4 + @;’s, and a linear combination of means, a= £;(u+a;), 
can be obtained as in Section 6.6. Thus, exact 100(1 — @) percent confidence 
intervals for uz + a; and Se £;(u4 + a@;) are given by 


MS aa) 
bcn 


yi... # tla(b — 1), 1 — a/2] 


and 


> 4.. £tla(b — 1), 1 — @/2] 
i=l 
respectively.* 


7.4 UNEQUAL NUMBERS IN THE SUBCLASSES 


Consider three factors A, B, and C where B is nested within A and C is nested 
within B. Suppose each A level has b; B levels, each B level has c;; C levels, 
and nj;;, samples are taken from each C level. Here, the model remains the 
same as (7.1.1), where? = 1,2,...,a; 7 = 1,2,...,0;,4 = 1,2,..., 04; 


and € = 1,2,..., njj,x. The total number of observations is 
a bij 
NS ee 
i=] j=l k= 


3 Fora ea of meee: for constructing conndence ulervals Ls individual variance com- 
ponents oY , Of and o2, the total variance 02 + o2 + of + G2, the ane aise ao? plop. 
and the proportions of variability ve | oe + 02), of /(a2 + o2), ve Mee ate o} + og + a2), 

a7 /(oZ +o2 p+0% +02), og /(a? +o, +03 peay and a2 /(a2 +o + Of gt), see Burdick 
aiid Graybill (1992, pp. 92-96). 

4 Formulae for selecting b, c, and n to minimize the cost of obtaining a sample have been derived 

by Marcuse (1949) and Vidmar and Brunden (1980). 
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Now, the sums of squares in the analysis of variance are computed as follows: 


a 2 2 
A= ie (Fi... - 9.) = a as 
ree Te N 
i=] 
a »b; 2 a 9) 
ij. Jie. 
SS3(A) = : y ni; (Vij. — Vi.) = y y = y SS 
is ee i=l j=] Nij. = Nj. 
a Cij a Cij a 2 
bj bj Vik. i Vij. 
SSc(s) = ) >: } a Nijk(Vijk. — Vij. y= = ) ) 
i=1 j=1 k= Pl gat ea. Uk yay Tay 
and 
a bj Cij_  Nijk 
fa) 
SS$_ = (Vijke — Vijk.) 
i=1 j=l k=1 £=1 
a ob Cig Mj a bh ij 2 
= 2 Yijk. 
a Yijke — oat 
i=1 j=1 k=1 @=1 i=1 j=l k=l © UF 


where the customary notations for totals and means are employed. The resultant 
analysis of variance is summarized in Table 7.2. The derivations for expected 
mean squares can be found in Ganguli (1941) and Scheffé (1959, pp. 255-— 
258). The coefficients of variance components under the expected mean square 
columns are determined as follows: 


_ N—kg 
a Oy 
N—kq 
iz = ——., 
b-—a 
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a—|1 
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a 
aR 
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- be Cj 
ks = DDD min/N: 
i= ede 1 k=1 
bi Cj 
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i=l j=1-k= 
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From Table 7.2, it is evident that no simple F tests are possible except for 
testing factor C effects. Again, this 1s so because under the null hypothesis 
we do not have two mean squares in the analysis of variance table that esti- 
mate the same quantity. Furthermore, the mean squares other than MS, are 
not distributed as constant times a chi-square random variable, and they are 
not statistically independent except that MS, is independent of the other mean 
squares. For the approximate tests of significance, we determine the coefficients 
of the variance components, calculate estimates of the variance components, 
and then determine linear combinations of mean squares to use as the de- 
nominators of the F ratios in the Satterthwaite approximation. The variance 
components estimates as usual are obtained by solving the equations obtained 
by equating the mean squares to their respective expected values. These are 
the so-called analysis of variance estimates (Mahamunulu (1963)). The for- 
mulae for these estimators including the expressions for their sampling vari- 
ances are also given in Searle (1971b, pp. 477-479) and Searle et al. (1992, 
pp. 431-433). 

An exact confidence interval for o2 can be obtained as in Section 6.10. 
Burdick and Graybill (1992, pp. 109-116) indicate a method for constructing 
confidence intervals for 07, 07, and oY with a numerical example. 


7.5 FOUR-WAY NESTED CLASSIFICATION 


In this section, we briefly review the analysis of variance for the four-way nested 
classification which 1s the obvious extension of the three-way nested analysis 
of variance. The model is 


i b dehslenl 
(i eee 2, 

Yijkem = +a; + Byiy + Vecjy + Seajey + Cmijeey 4 K=1,--.,€ (75.1) 
{= I, ,d 
m= 1, Jn 


where the meaning of each symbol and the assumptions of the model are readily 
stated. Starting with the identity 


Yijkem — Y..... = VW.... — Yu...) + ij... — Vi.) + Dijk. — Jij.) 
+ (Vijke. — Vijk..) + OVijeem — Vijxe.)s 


the total sum of squares 1s partitioned as 


SS7r = SS, + SSB) + SSc(B) + SS pc) + SSzr, 
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where 
a b Cc d n 
SSr =>) YS dD Oiinem — 5...) 
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2 2 
a Vijkem —~ poy Moe? 
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f=! y=1 k=! f=) i=]. j=1 k=1 
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a b Cc d n 
SSE = > > (Vijkem — Vijke.) 
i=1 j=1 k=1 €=1 m= 
a b Cc d n l a b Cc d 
2 2 
= = De Vijktm ~ 7 » > Yinne 
i=1 j=1 k=1 €=1 m=1 i=1 j=l k=1 €=1 


with the usual notations of dots and bars. The corresponding mean squares 
denoted by MS,4, MSa,4), MSc), MS pic), and MS< are obtained by dividing 
the sums of squares by the respective degrees of freedom. The resultant analysis 
of variance is summarized in Table 7.3. The nested classifications having more 
than four factors have the analysis of variance tables with the same general 
pattern. 
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The expected mean square column of Table 7.3 suggests the proper test 
statistics to be employed for testing the particular hypotheses of interest. For 
example, under Model II, note that each expected mean square contains all 
the terms of the expected mean square that follows it in the table. Thus, an 
appropriate F statistic to test the statistical significance of any factor effects 
is determined as the mean square of the factor of interest divided by the mean 
square immediately following it. The unbiased estimators of the variance com- 
ponents are also readily obtained using the customary analysis of variance pro- 
cedure. In particular, the best unbiased estimators of the variance components 
under Model II are obtained simply by using the differences between the mean 
square of the factor of interest and the one immediately following it; that is, 


6? = MSz, 

6; = (MSpvc) — MSz)/n, 

6) = (MScia) — MSp@)/dn, 
é5 = (MSa (4) — MScay)/cdn, 


and 
5? — (MS, — MSa,a))/bcdn. 


As in Section 7.3, an exact confidence interval for a? can be obtained by 
noting that abcd(n — 1)MS_z/ oa? has a chi-square distribution with abcd(n — 1) 
degrees of freedom. For a discussion of methods for constructing confidence 
intervals for other variance components, including certain sums and ratios of 
variance components, see Burdick and Graybill (1992, pp. 92-95). 


7.6 GENERAL g-WAY NESTED CLASSIFICATION 


The results of a nested classification can be readily generalized to the case of qg 
completely nested factors. Such a design is also called a (¢q + 1)-stage nested 
design. The general g-way nested classification model is a direct extension of 
the model (7.5.1) and can be written as 


Pes Loewe 

eo en 2, 

Vijk...par = H+ Byiy + Yejy £2 ++ + SgGijk...p) + ertijk..pqg) YX = 1,25---5€ 
a — a 
(7.6.1) 
where jz is the general mean, a, Bj(i), Ve(ij)s - -- » Sg(ijk...p) are the effects due to 


the i-th level of factor A, the j-th level of factor B, the k-th level of factorC,..., 
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the q-th level of factor Q, and é,(ijx...pg) 18 the customary error term that rep- 
resents the variation within each cell. The assumptions of the model (7.6.1) 
are readily stated depending upon whether the levels of factors A, B,C,..., Q 
are fixed or random. For example, under Model II, the a;’s, Bji)’s, yeu jy’s, 
.. +) Og(ijk...p) Ss ANd €,(i jk... pqy’S are independent (normal) random variables with 
means of zero and variances 07, 07, a. ..., and 02, respectively. For this bal- 
anced model, all the mean squares are distributed as constant times a chi-square 
random variable and they are statistically independent. Under the assumptions 
of the model of interest, the results on tests of hypotheses and estimation of 
variance components can be obtained in the manner described earlier. For a 
discussion of methods for constructing confidence intervals for the variance 
components, see Burdick and Graybill (1992, Section 5.3). For the analysis of 
a general g-way nested classification with unequal numbers in the subclasses, 
we use the same approach as given in Section 7.4. The details on analysis 
of variance, tests of hypotheses, and variance components estimation can be 
found in Gates and Shiue (1962) and Gower (1962). Khuri (1990) presents 
some exact tests for random models when all stages except the last one are 
balanced. 


7.7 WORKED EXAMPLE FOR MODEL II 


Brownlee (1953, p. 117) reported data from an experiment carried out to deter- 
mine whether a batch of material was homogeneous. The material was sampled 
in six different vats. The matter from each vat was wrung in a centrifuge and 
bagged. Two bags were randomly selected from each vat and two samples were 
taken from each bag. Finally, for each sample, two independent determina- 
tions were made for the percentage of an ingredient. The data are given in 
Table 7.4. 

The experimental structure follows a three-way nested or hierarchical classi- 
fication and the mathematical model is 


l 
Yijke = WO; + Bic) + Vecijy + Cecjey : 
£ 


where yjjxe 1s the £-th determination (analysis) of the k-th sample, of the j-th 
bag and for the i-th vat, jz is the general mean, a; is the effect of the i-th vat, 
Bj) 18 the effect of the j-th bag within the i-th vat, yj;;) is the effect of the k-th 
sample within the j-th bag within the i-th vat, and e¢(;;,) 1s the customary error 
term. Furthermore, in this example, it is reasonable to assume that all factors 
are random and thus the a;’s, Bji)’s, Yecijy’S, and @¢(;jx)’S are all independently 
and normally distributed with mean zero and variances ae, OR» a. and oa, 


respectively. 
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Using the computational formulae for the sums of squares given in Section 7.2, 
we have 


2 
SSr = (29)? + (29)? +--- + (31) — a 
= 39,156 — 38,988 
= 168.000, 
$5, = (226) + (215)? +---+(242" — (1,368) 
2x 2% 2 6x2x2x2 
= 39,054.250 — 38,988 
= 66.250, 
Sa = (113) (113) eee C19) (226)* + (215)? +--+ + (242) 
2x2 2 ee 2 
= 39,090.500 — 39,054.250 
= 36.250, 
(58)? + (55)* +---+ (60)? (113)? + (113)? +--- +119)" 
66 = eS Se ee 
2 2x2 
= 39,136 — 39,090.500 
= 45.500, 
and 


58)? 55)" Sessa 60)2 
SSz = 29) + 297? +... 4 a? - SE EET EE 


= 39,156 — 39,136 
= 20.000. 


These results along with the remaining calculations are summarized in 
Table 7.5. The test of the hypothesis Ho a, = () versus Hy See oy > 0 
gives the variance ratio of 4.55 which is highly significant (p < 0.001). The test 
of the hypothesis Hy (4). 2 = 0 versus H?™: of > 0 gives the variance ratio 
of 1.59 which is not significant (p = 0.232). Finally, the test of the hypothesis 
H}!: 02 = 0 versus H/\: 02 > 0 gives the variance ratio of 2.19 which is again 
not significant (p = 0.184). However, note that the F test for vats has so few 
degrees of freedom that it may not be able to detect significant differences even 
if there really are important differences among them. Thus, we may conclude 
that although there seems to be significant variability among samples within 
bags and some variability between vats, there is no indication of any differ- 
ences among bags within vats. The estimates of the variance components are 
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TABLE 7.5 

Analysis of Variance for the Material Homogeneity Data of Table 7.4 

Source of Degrees of Sumof Mean Expected 

Variation Freedom Squares Square Mean Square F Value _p-Value 

Vats 5 66.250 13.250 o2 +207+2x20f 2.19 0.183 

+2x2x 202 

Bags 6 36.250 6.042 o2+207+2x20f 1.59 0.232 
(within vats) 

Samples 12 45.500 3.792 o7 +20) 4.55 <0.001 
(within bags) 

Error 24 20.000 0.833 ao? 

Total 47 168.000 

given by 


2 
a? l 
= Bee — 0.833) = 1.480, 
1 
35 — qo — 3.792) = 0.563, 
and 


1 
5? = g (13.250 — 6.042) = 0.901. 


These variance components account for 22.0, 39.2, 14.9, and 23.9 percent of 
the total variation in material content in this experiment. It is evident from this 
analysis that the batch of material under investigation is highly inhomogeneous 
and the larger part of this variability arises in bagging the material. The vari- 
ability between repeated analyses on a given sample is also quite large, and 
there also seem to be appreciable differences between contents of each vat. 
We further refine the preceding analysis by resorting to the method of pooling. 
In our earlier analysis we have seen that the variance component due to bags 
within vats (03) is not statistically significant. We can thus pool its mean square 
with the samples within bags mean square to get a new estimate of o2 + 207 
equal to (45.500 + 36.250)/18 = 4.542 with 18 degrees of freedom. Now, the 
hypothesis on the variance component due to vats (a7) is tested by the between 
vats mean square against the pooled value of the between samples within bags 
mean square. The variance ratio for the test is 13.250/4.542 = 2.92 with 5 
and 18 degrees of freedom, respectively. Note that in contrast to the unpooled 
analysis, this value is significant at the 5 percent level (p = 0.041). Finally, the 
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pooled estimates of variance components are now given by 


2 
1 
= 5 (4.542 — 0.833) = 1.855, 


of = 0, 
and 
1 
6? = ¢ (13.250 — 4.542) = 1.089. 


These variance components account for 22.1, 49.1, 0, and 28.8 percent of 
the total variation in the material. The results of pooled analysis are similar 
to the earlier analysis. Thus, it is seen that there is an appreciable variability 
between vats. The variability between bags from a given vat is not large enough 
to be statistically significant. The variability between samples from a given 
bag is extremely large and, in fact, may account for nearly half of the total 
variation. The variability between duplicate analyses of a given sample is also 
quite large, probably the second most important component of variability in the 
process. 


7.8 WORKED EXAMPLE FOR MODEL II: UNEQUAL NUMBERS 
IN THE SUBCLASSES 


Damon and Harvey (1987, p. 29) reported data from an experiment to determine 
the number of diatoms at different locations on a river. (The original data were 
supplied by Dr. Richard Larsen of the Department of Fisheries and Wild Life of 
the University of Massachusetts.) The experiment entailed determination of the 
number of diatoms at two randomly selected locations, two or three bricks at 
each location, and one or two slides attached to each brick. Thus, there are two 
hierarchies of nesting, bricks nested within locations and slides nested within 
bricks. The number of diatoms per square centimeter colonizing each glass slide 
were determined and the data are given in Table 7.6. 

The design structure follows a three-way nested or hierarchical classification 
and the mathematical model 1s 


b] 


LZ 

= ese 2. 
LZ 
Le 


ma (7.8.1) 
»+++5Nijk, 


b] 


Vijke = +0; + By) + Vez) + ein) : 
£ 


b) 


where yj;x¢ 1S the €-th observation on the k-th slide, on the j-th brick and at the 
i-th location. Here, the number of bricks in the i-th location is designated as 
b; (b; =3, by =2), the number of slides in the j(i)-th brick (location) subclass 
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TABLE 7.6 
Number of Diatoms per Square Centimeter Colonizing Glass Slides* 
Location 1 Location 2 
Brick 1 Brick 2 Brick 3 Brick 1 Brick 2 


Slide 1 Slide 2 Slide 1 Slide 2 Slide1 Slide 1 Slide2 Slide1 Slide 2 


102 500 142 119 243 500 822 826 642 


111 480 125 114 189 165 743 750 710 
400 112 221 461 362 752 263 682 720 
380 103 464 382 264 142 321 522 584 
210 225 510 921 620 650 650 
245 361 380 792 584 621 
842 871 841 
657 900 
Yijk. 1,448 3,280 952 1,966 1,058 5,043 4,194 4,051 3,306 
Nijk 6 8 4 6 4 8 7 6 5 
Vij. 4,728 2,918 1,058 9,237 7,357 
nij. 14 10 4 15 11 
yj 8,704 16,594 
n 28 26 


Source: Damon and Harvey (1987, p. 29). Used with permission. 


* Numbers have been coded to simplify computation. 


is designated as cjj (C11) =2, C12 =2, C13 = 1, C21} =2, C22 =2), and the num- 
ber of observations in the k(ij)-th slide (brick (location)) sub-subclass is desig- 
nated as nj jx (M111 = 6, N12 = 8, N21 =4, N122 = 6, 113) =4, N11 = 8, N22 =7, 
N22, = 6, N22 =5). Furthermore, in the model equation (7.8.1), jz 1s the general 
mean, a; is the effect of the i-th location, Bj,) is the effect of the j-th brick 
in the i-th location, yy(j) is the effect of the k-th slide on the j-th brick in the 
i-th location, and é¢(;x) 1s the customary error term. Finally, we will assume 
that all factors are random; that is, the a;’s, Bjcy’s, Yeajy’S, and eg jx)’S are 
independently and normally distributed with mean zero and variances o7, 07, 


2 2 
ie and o;, respectively. 


All the quantities needed for the analysis of variance computations outlined 
in Section 7.4 can be readily computed on an electronic calculator. The results 


are: 


Oo 


y? /N = (25,298)7/54 = 11,851,644.52, 
ab Cij Nijk 


Vege = 15,370,364, 
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a A yee, 1,448) — (3,280)? (3,306)” 
i=] joi kai [isk : 
= 13,457,673.97, 
3 Yi. _ (4,728? | 2,918) (7,357) 
Sony 14 10 = 
i=l j=l VY 


= 13,336,666.51, 


and 
y 2 y; _ @, lal n (16,594)? 
= 26 
= ue 
Thus, 
SS7 = 15,370,364 — 11,851,644.52 = 3,518,719.48, 
SS, = 13,296,501.96 — 11,851,644.52 = 1,444,857.44, 
SSa(a) = 13,336,666.51 — 13,296,501.96 = 40,164.55, 
SSccsy = 13,457,673.97 — 13,336,666.51 = 121,007.46, 
and 


SS¢_ = 15,370,364 — 13,457,673.97 = 1,912,690.03. 
The corresponding degrees of freedom are computed as: 


Total: N—-1=54—-—-1 =S3, 


Locations: a-—-1=2-1=1, 


Bricks (within locations): 2 b -a=5-2=3, 


Slides (within bricks): S* y Cis — . b =9-5=4, 


1 j= 
a 


Error: N-) ) ej =54-9 =45. 


i=1 j=1 


For appropriate tests of significance, we must evaluate the coefficients of the 
variance components in the expected mean squares, and determine the linear 
combinations of mean squares to be used as the denominator of an approxi- 
mate F statistic using Satterthwaite procedure. The basic quantities needed to 
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determine the coefficients of the variance components are computed as follows: 


a bh Cj 
N= DO ae +5=54, 
f=] 7=!1 k= 
yon 2 2 
na 28 26 
Se (28)" + (26)" — 27.0370, 
N 54 
»> mi (14)? + (10)? + --- + (11) 
ko = Sd 12.1852, 
N 54 
3 0 ni 
ij 
j=1 jal ke 6)2 Q 2 cee 5) 
ly = oe -o re TOY _ 63333, 
aun 6 (14)? +10)? + (4 = (15)? +11)” 
kg = sa eS 4 06. 
: dd, Nj. 28 r 26 
ni, OP + BP +. +4P BP +? +--+" 
— aL 
: yy Nj.. 28 . 26 
i=] j=l k=1 
= 12.6923, 
and 
ini (6) +(8) — (4)? + (6) (6)? + (5) 
— SUI ose cel ee Bg Seas 
: yyy ni Gg ee 
i jel-kel 
= 29.4217. 


Now, the coefficients of the variance components in the expected mean square 
column are given by 


N—ke 54—29.4217 


fi, = = 610416: 
c—b 4 
kg —ks 29.4217 — 12.6923 
5 5165. 
b—a 3 
N—k,  54—24.4506 
= = 0 808. 
b—a 3 
ks —k3 12.6923 — 6.3333 
ip SS 5G 3500: 
a—1 ] 
kg —ky ~~ 24.4506 — 12.1852 
5 19654. 


a— | ] 
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TABLE 7.7 
Analysis of Variance for the Diatom Data of Table 7.6 
Source of Degrees of Sum of Mean Expected 
Variation Freedom Squares Square Mean Square 
Locations 1 1,444,857.44 1,444,857.440 of + 6.359007 + 12.265403 
+ 26.963002 
Bricks (within 3 40,164.55 13,388.183 07 + 5.57650, + 9.84980, 
locations 
Slides (within 4 121,007.46 30,251.865 of? + 6.144607 
bricks) 
Error 45 1,912,690.03 42,504.223 oa? 
Total 53 3,518,719.48 
and 


N—-k, 7 54 — 27.0370 
a-1l | l 


= 26.9630. 


no = 


The results on sums of squares, mean squares, and expected mean squares are 
summarized in Table 7.7. 

The slides within bricks effects can be tested directly against the error mean 
square, giving F = 30,251.865/42,504.223 = 0.712 (p =0.588). The results 
are clearly nonsignificant. An approximate F test for bricks within locations can 
be obtained using the slides within bricks mean square, giving F = 13,388.183/ 
30,251.865 = 0.443 (p = 0.735). However, to use the Satterthwaite procedure, 
we first compute the coefficient as 


ly = nN2/n, = 5.5765/6.1446 = 0.9075, &,; = 1 — £2 = 0.0925; 
and the synthesized mean square 1s 
0.0925(42,504.223) + 0.9075(30,25 1.865) = 31,385.208. 


The number of degrees of freedom for the synthesized mean square (rounded 
to the nearest digit) is 


; (31,385.208)? 

iY SS eS eee 

[0.0925(42,504.223)]* — [0.9075(30,251.865)]? 

ee ees + a ae 

45 4 

The F ratio based on the synthesized mean square is F' = 13,388.183/31,385.208 
= 0.427 (p = 0.743), which gives nearly the same result as before; that is, 
bricks within location effects are also not significant. Similar procedures are 


used to test for location effects. For example, an approximate F test for location 
effects can be obtained using the bricks within location mean square, giving 
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F = 1,444,857.440/13,388.183 = 107.920(p = 0.002). To use the Satterth- 
waite procedure, the coefficients are 
5 = ns5/n3 = 12.2654/9.8498 = 1.2452, 
£4 = fig/A, — l5n2/ny 
= (6.3590/6.1446) — 1.2452(5.5765/6.1446) = —0.0952, 
3 = 1—£4—£5 = 1 — (—0.0952) — 1.2452 = —0.1500 


and the synthesized mean square 1s 


—0.1500(42,504.223) + (—0.0952)(30, 251.865) + 1.2452(13, 388.183) 
= 7,415.354. 


The number of degrees of freedom for the synthesized mean square (rounded 
to the nearest digit) 1s 


P (7,415.354) 
=e ee eee 
[—0.1500(42,504.223)]° - [—0.0952(30,251.865)}° A [1.2452(13,388.183)]° 


45 4 3 
= 2. 


The F ratio based on the synthesized mean square is 1 ,444,857.440/7,415.354 = 
194.847. Again, the results are highly significant (p < 0.001). 

The estimates of the variance components o2, Op, Oz, and a? are obtained 
as the solution to the following simultaneous equations: 


1,444,857.440 = 0? + 6.359007 + 12.265405 + 26.96300;, 
13,388.183 = 07 + 5.57650, + 9.849805, 
30,251.865 = 0, + 6.14460/, 


and 
42,502.223 = o?. 


Therefore, the desired estimates are given by 


32 = 42,504.223, 
30,251.865 — 42,504.223 

g2 OR = —1,994.004, 

4 6.1446 
.» _ 13,388.183 — 42,504.223 — 5.5765(—1,994.004) 

ae ee 877 09 

9.8498 
and 

a2 _ 1,444,857.440 — 42,504.223 — 6.3590(—1,994.004) — 12.2654(—1,827.091) 
mere 26.9630 


= 53,311.690. 
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The negative estimates are probably an indication that the corresponding vari- 
ance components may be zero. The point estimates of variance components 
are consistent with the results on tests of hypotheses. It is further evident from 
the analysis that the most of the variation in the number of diatoms is due to 
different location on the river. 


7.9 WORKED EXAMPLE FOR MODEL Ill 


Sokal and Rohlf (1995, p. 289) reported data from an experiment designed 
to analyze glycogen content of rat livers. For each of the three treatments — 
control, compound 217, and compound 217 plus sugar — used in the experi- 
ment, three preparations of rat livers from each of the two rats were analyzed 
and duplicate readings were made for each preparation. The data are given in 
Table 7.8. 

The design structure follows a three-way nested or hierarchical classification 
and the mathematical model is 


i 
Vijke = UL +0; + By + Vey + Ceajx : (7.9.1) 
£ 


where yjjxe 1S the £-th observation (reading) on the k-th preparation, on the 
j-th rat and for the i-th treatment, jz is the general mean, a; is the effect of 
the i-th treatment, 6j,;) is the effect of the j-th rat within the :-th treatment, 
and yx(jj) is the effect of the k-th preparation within the j-th rat within the 
i-th treatment, and é¢(;;x) 1s the customary error term. Furthermore, the a;’s are 
considered to be fixed effects with ee a; = 0, and the Bj(i)’s, Yeijy)’s, and 
€x(ijk) S are assumed to be independently and normally distributed with mean 
zero and variances 02, Oy, and Oe; respectively. 

Using the computational formulae for the sums of squares given in Section 
7.2, we have 


2 
SSr = (131)? + (130) +--- + (127)? — soos 
= 731,508 — 728,177.778 
= 3,330.222, 
$5, = (1,686)* + (1,812)? + (1,622) (5,120)? 
2x3x2 3x2x3x2 
= 729,735.333 — 728, 177.778 
= 1,557.555, 
(795) + (891) +---+(816)? (1,686)? + (1,812)? + (1,622)? 
SS$3(4) = —— SS - 
3x2 2X32 


= 730,533 — 729,735 .333 
= 797.667, 
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_ (261) + (256)? + +--+ (261)? — (795) + (891) + --- + (816)" 
7 2 7 3x2 

= 731,127 — 730,533 

= 594.000, 


SSca) 


and 


261)? + (256) +---+(261)" 
SS = (131)? + (1307 +--+ (127)? — ee 
= 731,508 — 731,127 


= 381.000. 


These results along with the remaining computations are summarized in 
Table 7.9. The test of the hypothesis Hy”: a, = 0 versus Hy: a, > O gives 
the variance ratio 2.34 which barely reaches its 5 percent critical value of 2.342 
(p = 0.050). The test of the hypothesis Hy: Op = ( versus He™: OR >0 
gives the variance ratio of 5.37 which clearly exceeds its 5 percent critical value 
of 3.49 (p = 0.014). Finally, the test of the hypothesis H;}: all a; = O versus 
H;': all a; 4 0 gives the variance ratio of 2.93 which is too low to reach 
its 5 percent critical value of 9.55 (p = 0.197). Thus, we may conclude that 
although there seem to be significant differences among preparations within 
rats and among rats within treatments, there is no indication of any differences 
between the treatments. However, note that the F test for treatments has so 
few degrees of freedom that it may not be able to detect significant differences 
even if there are really important differences among them. Perhaps repetition 
of the experiment using more rats per treatment is indicated. The estimates of 
variance components 07, OF, and Op are given by 


2 
I 
5? = (49.500 — 21.167) = 14.167, 


and 


1 
3g = (265.889 — 49.500) = 36.065. 


These variance components account for 29.6, 19.8, and 50.5 percent of the 
total variation in glycogen content in this experiment. It is evident from this 
analysis that the large part of the variability arises among rats within treatments. 
Readings within preparations and preparations within rats also seem to account 
for a significant portion of the total variability in the experiment. However, we 
cannot establish significant differences among treatments. 
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7.10 USE OF STATISTICAL COMPUTING PACKAGES 


The use of SAS, SPSS, and BMDP programs for analyzing three- and higher- 
order nested factors is the same as described in Section 6.15 for the case of 
two-way nested designs. No new problems arise for analysis involving higher- 
order nested designs. 


7.11 WORKED EXAMPLES USING STATISTICAL PACKAGES 


In this section, we illustrate the application of statistical packages to perform 
three-way nested analysis of variance for the data sets employed in examples 
presented in Sections 7.7 through 7.9. Figures 7.2 through 7.4 illustrate the 
program instructions and the output results for analyzing data in Tables 7.4, 
7.6, and 7.8 using SAS GLM, SPSS GLM, and BMDP 3V/8V procedures. The 
typical output provides the data format listed at the top, all cell means, and 
the entries of the analysis of variance table. It should be noticed that in each 
case the results are the same as those provided using manual computations in 
Sections 7.7 through 7.9. However, note that in an unbalanced design, certain 
tests of significance may differ from one program to the other since they use 
different types of sums of squares. 


DATA INGREDIENT; 
INPUT VAT BAG SAMPLE 


The SAS System 
General Linear Models Procedure 
Dependent Variable: PERCENT 
Sum of 
Squares 
148.00000000 
-20.00000000 
168 .00000000 


Mean 
Square 
6.43478261 
0.83333333 


DF 
23 
24 
47 


Pr > F 
0.0001 


F Value 
7.72 


Source 
Model 
Error 
Corrected 
Total 
Root MSE PERCENT Mean 
0.91287093 28 . 50000000 
F Value Pr>F 
15.90 0.0001 
7.25 0.0002 
4.55 0.0008 


R-Square c.V. 

0.880952 3.2030559 
Source DF Type III SS Mean Square 
VAT 5 66.250000 13.250000 
BAG (VAT) 6 36.250000 6.041667 
SAMPLE (VAT* BAG) 12 45.500000 3.791667 
Source Type III Expected Mean Square 


CLASSES VAT BAG SAMPLE; 
f MODEL PERCENT=VAT BAG(VAT) 


VAT 
TEST H=VAT E=BAG(VAT) 


TEST H=BAG (VAT) BAG (VAT) 


Var (Error) +2 
+ 8 Var (VAT) 
Var (Error) +2 


Var (SAMPLE (VAT*BAG) ) +4Var (BAG (VAT) ) 


Var (SAMPLE (VAT*BAG) ) +4Var (BAG (VAT) ) 


E=SAMPLE (BAG VAT); 
RUN; 
#CLASS LEVELS VALUES 


SAMPLE (VAT*BAG) Var(Error) + 2 Var(SAMPLE (VAT*BAG) ) 


Tests of Hypotheses using the Type III MS for BAG(VAT) as an 
error term 

DF Type III Ss 

5 66.25000000 

DF Type III SS 

BAG (VAT) 6 36.25000000 


F Value Pr > F 
2.19 0.1834 
F Value Pr > F 
1.59 0.2316 


Mean Square 
13.25000000 
Mean Square 

6.04166667 


HNUMBER OF OBS. IN DATA 


(i) SAS application: SAS GLM instructions and output for the three-way random effects 
nested analysis of variance. 


FIGURE 7.2 Program Instructions and Output for the Three-Way Random Ef- 
fects Nested Analysis of Variance: Material Homogeneity Data for Example of 
Section 7.7 (Table 7.4). 


422 The Analysis of Variance 


DATA LIST Tests of Between-Subjects Effects Dependent Variable: PERCENT 


| /VAT 1 : 

BAG 3 Source Type III SS df Mean Square F Sig. 
SAMPLE 5 VAT Hypothesis 66.250 5 13.250 2.193 .183 

PERCENT 7-8. Error 36.250 6 6.042(a) 
BAG (VAT) Hypothesis 36.250 6 6.042 1.593 .232 

Error 45.500 12 3.792 (b) 
SAMPLE (BAG Hypothesis 45.500 12 3.792 4.550 .001 

(VAT) } Error 20.000 24 0.833 (c) 


a MS(BAG(VAT)) b MS(SAMPLE(BAG(VAT))) c MS(ERROR) 


Expected Mean Squares (a,b) 
Variance Component 


GLM PERCENT 


Source Var(VAT) Var(BAG(VAT)) Var(SAMPLE(BAG)) Var(Error) 
BY VAT BAG VAT 8.000 4.000 2.000 1.000 
f SAMPLE. BAG (VAT) .-000 4.000 2.000 1.000 
}/DESIGN VAT SAMPLE (BAG (VAT) ) .000 .000 2.000 1.000 
BAG (VAT) Error .000 .000 -000 1.000 
SAMPLE (BAG a For each source, the expected mean square equals the sum of _ thef 
(VAT) ) coefficients in the cells times the variance components, plus a quadratic} 


/RANDOM VAT BAG |term involving effects in the Quadratic Term cell. b Expected Mean Squares 


are based on the Type III Sums of Squares. 


(ii) SPSS application: SPSS GLM instructions and output for the three-way random 
effects nested analysis of variance. 


FILE='C:\SAHAI\ BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
TEXTO\EJE19.TXT'. - EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 
VARIABLES22. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 
/VARIABLE NAMES=D1, D2. 
/DESIGN NAMES=V,B,5S,D. SOURCE ERROR SUM OF D.F. MEAN F PROB. 
LEVELS=6,2,2,2. TERM SQUARES SQUARE 
RANDOM=V,B,.S,D. MEAN VAT 38988.00000 1 38988.000 2942.49 0.0000 
MODEL='V,B(V),S(B), B(V) 66.25000 5 13.250 2.19 0.1834 
D(S)'. B(V) S (VB) 36.25000 6 6.042 1.59 0.2316 
1 /END S(VB) D(VBS) 45.50000 12 3.792 4.55 0.0008 
729 29 D(VBS) 20.00000 24 0.833 
+ SOURCE EXPECTED MEAN ESTIMATES OF 
$29 31 SQUARE VARIANCE COMPONENTS 
HANALYSIS OF VARIANCE DESIGN MEAN 48(1)+8 (2)+4 (3)+2(4)+(5) 811.97396 
Vv B S D VAT 8(2)+4(3)+2 (4) +(5) 0.90104 
}NUM LEVELS 6 2 2 2 B(V) 4(3)+2 (4) +(5) 0.56250 
POPULATION INF INF INF INF S (VB) 2 (4)+(5) 1.47917 
| MODEL V, B(V), S(B), D(S) D(VBS) (5) 0.83333 


(iii) BMDP application: BMDP 8V instructions and output for the three-way random 
effects nested analysis of variance. 


FIGURE 7.2 (continued) 
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DATA DIATOMS; The SAS System 

INPUT LOCATION BRICK General Linear Models Procedure 
} SLIDE DIATOMS; Dependent Variable: DIATOMS 

DATALINES; Sum of Mean 
102 Source DF Squares Square F Value Pr>F 
111 Model 8 1606029.4 200753.7 4.72 0.0003 
400 Error 45 1912690.0 42504.2 
380 Corrected 53 3518719.5 
210 Total 
245 R-Square c.V. Root MSE DIATOMS Mean 
500 0.456424 44.00719 206.17 468.48 
480 Source DF Type I SS Mean Square F Value Pr >F 
112 LOCATION 1 1444857.4 1444857.4 33.99 0.0001 
103 Source DF Type I SS Mean Square F Value Pr >F 
225 BRICK (LOCATION) 3 40164.6 13388 .2 0.31 0.8144 
361 SLIDE (LOCATION*BRICK) 4 121007.5 30251.9 0.71 0.5882 
842 Source DF Type III SS Mean Square F Value Pr > F 
657 LOCATION 1 1483396.9 1483396.9 34.90 0.0001 
142 BRICK (LOCATION) 3 34822.4 11607.5 0.27 0.8445 
125 SLIDE (LOCATION*BRICK) 4 121007.5 30251.9 0.71 0.5882 
221 Source Type III Expected Mean Square 
. LOCATION Var(Error) + 5.591 Var (SLIDE (LOCATION*BRICK) ) 
650 + 10.271 Var(BRICK(LOCATION)) + 24.489 
Var (LOCATION) 

PROC GLM; BRICK (LOCATION) Var(Error) + 5.4151 Var (SLIDE (LOCATION*BRICK) ) 
CLASSES LOCATION BRICK + 9.6922 Var (BRICK (LOCATION) ) 

SLIDE; SLIDE (LOCATION*BRICK) Var(Error) + 6.1446 Var (SLIDE (LOCATION*BRICK) ) 
H MODEL DIATOMS=LOCATION Source: LOCATION - Error: 1.0598*MS(BRICK(LOCATION)) - 

BRICK (LOCATION) 0.024*MS (SLIDE (LOCATION*BRICK)) - 0.0357*MS (Error) 

SLIDE (BRICK LOCATION); Denominator Denominator 

RANDOM LOCATION DF Type III MS DF MS F Value Pr>F 
1 BRICK (LOCATION) 1 1483396.9269 2.00 10055.801769 147.5165 0.0067 
SLIDE (BRICK LOCATION) / Source: BRICK(LOCATION) Error:0.8813*MS (SLIDE (LOCATION* BRICK) } 
TEST; +.1187*MS (Error) 

RUN; Denominator Denominator 

CLASS LEVELS VALUES DF Type III MS DF MS F Value 

LOCATION 2 3 11607 .477802 5.64 31706.429974 0.3661 
| BRICK 3 Source: SLIDE(LOCATION*BRICK) Error: MS (Error) 
H SLIDE 2 Denominator Denominator 


NUMBER OF OBS. IN DATA DF Type III MS DF MS F Value 
SET=54 30251.865341 45 42504.222937 0.7117 
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(i) SAS application: SAS GLM instructions and output for the three-way random effects 
nested analysis of variance with unequal numbers in the subclasses. 


DATA LIST Tests of Between-Subjects Effects Dependent Variable: DIATOMS 
/LOCATION 1 
BRICK 3 Source Type III SS df Mean Square F Sig 
SLIDE 5 LOCATION Hypothesis 1483396.927 1 1483396.927 147.517 .007 
DIATOMS 7-9. Error 20086. 860 1.998 10055.802 (a) 
DATA. BRICK (LOC) Hypothesis 34822 .434 3 11607.478 -366 .781 
102 Error 178807.755 5.639 31706. 430 (b) 
111 SLIDE(BRICK Hypothesis 121007.461 4 30251.865 -712 .588 
400 (LOCATION)) Error 1912690.000 45 42504.229(c) 
380 a 1.060 MS(B(L))-2.403E-02 MS(S(B(L)))-3.572E-02 MS(E) b .881 MS (S(B(L)))+ 
‘ . .119 MS(E) c MS(E) 
650 
DATA. Expected Mean Squares (a,b) 
DIATOMS Variance Component ‘+ 
Source Var (LOC) Var (B (LOC) ) Var (S(B) Var (Error) 
BRICK SLIDE LOCATION 24.489 10.271 5.591 1.000 
j /RANDOM LOCATION | BRICK (LOCATION) -000 9.692 5.415 1.000 
BRICK SLIDE SLIDE (BRICK (LOCATION) ) -000 -000 6.145 1.000 
/ DESIGN Error -000 -000 .000 1.000 
1 LOCATION a For each source, the expected mean square equals the sum of the | 
BRICK(LOCATION) | coefficients in the cells times the variance components, plus a quadratic] 
SLIDE (BRICK term involving effects in the Quadratic Term cell. b Expected Mean Squares 
(LOCATION) ) . are based on the Type III Sums of Squares. 


(ii) SPSS application: SPSS GLM instructions and output for the three-way random 
effects nested analysis of variance with unequal numbers in the subclasses. 


FIGURE 7.3. Program Instructions and Output for the Three-Way Random Ef- 
fects Nested Analysis of Variance with Unequal Numbers in the Subclasses: Diatom 
Data for Example of Section 7.8 (Table 7.6). 


} / INPUT 


/VARIABLE 


FILE='"C: \SAHAI 


\TEXTO\EJE20.TXT'. 


FORMAT=FREE. 
VARIABLES=4. 


NAMES=LOCATION, BRICK, 


SLIDE, DIATOM. 


CODES (LOCATION) =1, 2. 
NAMES (LOCATION) =L1, L2. 
CODES (BRICK)=1, 2,3. 
NAMES (BRICK) =B1,B2,B3. 
CODES (SLIDE)=1, 2. 
NAMES (SLIDE) =S1,S2. 
DEPENDENT=DIATOM. 


RANDOM=LOCATION. 


RANDOM=LOCATION, BRICK. 


RANDOM=BRICK, 


SLIDE. 


RNAMES=L,'B(L)','S(B)°. 


METHOD=REML. 


The Analysis of Variance 


BMDP3V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
Release: 7.0 (BMDP/DYNAMIC) 


DEPENDENT VARIABLE DIATOM 


PARAMETER ESTIMATE STANDARD EST/ 


ERROR ST. DEV. 


TWO-TAIL PROB. 
(ASYM. THEORY) 


ERR. VAR. 
CONSTANT 
LOCATION 
BRK (LOC) 
SLD (BRK) 


39881.962 7821.496 
474.377 163.687 
52107.607 75783.658 
0.000 0.000 
0.000 0.000 


2.898 0.004 


TESTS OF FIXED EFFECTS BASED ON ASYMPTOTIC VARIANCE 
-COVARIANCE MATRIX 
SOURCE 


F-STATISTIC DEGREES OF 


FREEDOM 


PROBABILITY 


CONSTANT 


(iii) BMDP application: BMDP 3V instructions and output for the three-way random 
effects nested analysis of variance with unequal numbers in the subclasses. 


FIGURE 7.3 (continued) 


| CONTROL 
| CONTROL 
J CONTROL 
| CONTROL 
| CONTROL 
CONTROL 
CONTROL 
| CONTROL 
} CONTROL 
i CONTROL 


| C217SUG 


1 PREPARAT 


| TEST; 

7 RUN; 

| CLASS 
TREATMNT 


| RAT 
| PREPARAT 


| SET=36 


NNNNFPRPRP PPB 


2 


1 PROC GLM; 
CLASSES TREATMNT RAT 


e 
’ 


DATA GLYCOGEN; 
INPUT TREATMNT $ RAT 
PREPARAT GLYCOGEN; 
DATALINES; 


131 
130 
131 
125 
136 
142 
150 
148 
140 
143 


NOR RP WWNNP PB 


127 


We 


H MODEL GLYCOGEN=TREATMNT 
PREPARAT (RAT TREATMNT) ; 


RANDOM RAT (TREATMNT) 
| PREPARAT (RAT TREATMNT) / 


LEVELS 


3 


C217SUG 


2 
3 


| NUMBER OF OBS. 


Source 
Model 
Error 


Total 


Source 
TREATMNT 


Source 
TREATMNT 


Source: 


DF 
2 
Source: 


DF 
3 
Source: 


DF 


nested analysis of variance. 


Dependent Variable: GLYCOGEN 


Corrected 


RAT (TREATMNT) 3 
PREPARAT (TREATMNT* RAT) 12 


RAT (TREATMNT ) 


PREPARAT (TREATMNT*RAT) Var(Error) + 2 Var (PREPARAT (TREATMNT*RAT) ) 
Tests of Hypotheses for Mixed Model Analysis of Variance 
TREATMNT Error: MS(RAT(TREATMNT) ) 


Type III MS’ DF MS 
778.77777778 3 
RAT (TREATMNT) 


Denominator Denominator 
Type III MS DF MS F Value Pr > F 
265.88888889 12 49.5 5.3715 0.0141 


PREPARAT (TREATMNT*RAT) Error: MS(Error) 


Type III MS DF MS 


The SAS System 
General Linear Models Procedure 


Sum of Mean 
DF Squares Square F Value Pr>F 
17 2949.22222 173. 48366 8.20 0.0001 
18 381.00000 21.16667 


35 3330.22222 


R-Square c.V. Root MSE GLYCOGEN Mean 
0.885593 3.234884 4.60072 142.222 


DF Type III SS Mean Square F Value Pr > F 


2 1557.55556 778.77778 36.79 0.0001 
797.66667 265.88889 12.56 0.0001 
594.00000 49.50000 2.34 0.0503 


Type III Expected Mean Square 

Var (Error) + 2 Var(PREPARAT (TREATMNT*RAT) ) 
+ 6 Var(RAT(TREATMNT)) + Q(TREATMNT) 
Var(Error) + 2 Var (PREPARAT (TREATMNT*RAT) } 
+ 6 Var (RAT (TREATMNT) ) 


Denominator 

F Value Pr > F 
265.88888889 2.9290 0.1971 
Error: MS (PREPARAT (TREATMNT*RAT) ) 


Denominator 


Denominator 
F Value Pr>F 
2.3386 0.0503 


Denominator 


18 21.16666666 


FIGURE 7.4 Program Instructions and Output for the Three-Way Mixed Effects 
Nested Analysis of Variance: Glycogen Data for Example of Section 7.9 (Table 7.8). 
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} DATA LIST 


Tests of Between-Subjects Effects Dependent Variable: GLYCOGEN 
| /TREATMNT 1 


RAT 3 Source Type III ss df Mean Square F Sig. 
PREPARAT 5 TREATMNT Hypothesis 1557.556 2 778.778 2.929 .197 
GLYCOGEN 7-9. Error 797.667 3 265.889 (a) 
| BEGIN DATA. RAT (TREATMN) Hypothesis 797.667 3 265.889 5.371 .014 
111131 Error 594.000 12 49.500 (b) 
111 #130 PREPARAT (RAT Hypothesis 594.000 12 49.500 2.339 .050 
112131 (TREATMNT)) Error 381.000 18 21.167 (c) 
112 125 a MS(RAT(TREATMNT)) b MS (PREPARAT (RAT (TREATMNT) ) 
Be can 43 : c MS(E) 
13 2 3 127 


DATA. Expected Mean Squares (a,b) 
Variance Component 

Source Var(R(T)) Var(P(R(T))) Var(Error) Quadratic Term 

TREATMNT 6.000 2.000 1.000 TREATMNT 


RAT (TREATMNT) 6.000 2.000 1.000 


TREAMNT PREPARAT (RAT (TREATMNT) ) -900 2.000 1.000 
RAT (TREATMNT ) Error 000 .000 1.000 
PREPARAT (RAT a For each source, the expected mean square equals the sum of the 
(TREATMNT) ). coefficients in the cells times the variance components, plus a quadratic} 


j /RANDOM RAT term involving effects in the Quadratic Term cell. b Expected Mean Squares| 


PREPARAT. are based on the Type III Sums of Squares. 


(ii) SPSS application: SPSS GLM instructions and output for the three-way mixed 
effects nested analysis of variance. 


FILE='C:\SAHAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\TEXTO\EJE21.TXT’. - EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT™FREE. 
VARIABLES=2. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 
NAMES=R1, R2. 
NAMES=T,R, P, D. SOURCE ERROR SUM OF D.F. MEAN F PROB. 
LEVELS=3,2,3,2. TERM SQUARES SQUARE 
RANDOM=R, P, D. MEAN R(T) 728177.778 728177.78 2738.65 0.0000 
FIXED=T. TREATMNT R(T) 1557.556 778.78 2.93 0.1971 
MODEL='T,R(T),P(R), R(T) P(TR) 797.667 -89 5.37 0.0141 
D(P)'. P(TR) D(TRP) 594.000 -50 2.34 0.0503 
| /END D(TRP) 381.000 17 
7131 130 
i . SOURCE EXPECTED MEAN ESTIMATES OF 
134 127 SQUERE VARIANCE COMPONENTS 
ANALYSIS OF VARIANCE DESIGN MEAN 36(1)+6(3)+2(4)+(5) 77469 
INDEX T R P D TREATMNT 12 (2)+6(3)+2(4)+(5) - 74074 
| NUM LEVELS 3 2 3 2 R(T) 6(3)+2(4)+(5) -06481 
iPOPULATION SIZE 3 INF INF INF|4 _ P(TR) 2(4)+(5) - 16667 


j MODEL T, R(T),P(R),D(P) D(TRP) (5) . 16667 


Sas a eae ae eee Soar 


(iii) BMDP application: BMDP 8V instructions and output for the three-way mixed 
effects nested analysis of variance. 


FIGURE 7.4 (continued) 


EXERCISES 


1. Anexperiment is performed to investigate alloy hardness using a three- 
way nested design having two fixed alloys with different chemistries, 
three heats within each alloy, two ingots within each heat, and two 
determinations are made on each ingot. The data in certain standard 
units are given as follows. 
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(a) Describe the model and the assumptions for the experiment. It is 
assumed that alloys and heats are fixed and ingots are random. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in mean hardness levels be- 
tween alloy chemistries. Use a = 0.05. 

(d) Test whether there are differences in mean hardness levels be- 
tween heats within alloys. Use a = 0.05. 

(e) Test whether there are differences in mean hardness levels be- 
tween ingots within heats. Use a = 0.05. 

(f) Estimate the variance components of the model and determine 
95 percent confidence intervals on them. 


2. Achemical company wishes to examine the strength of a certain liquid 


Vat 


chemical. The chemical is made in large vats and is then barreled. A 
random sample of three different vats is selected, three barrels are 
selected at random from each vat, and then two samples are taken for 
each barrel. Finally, two independent measurements are made on each 
sample. The data in certain standard units are given as follows. 


Barrel 1 2 3 1 2 3 1 2 3 


Sample 1 2 1 2 1 2 1 2 717 2 71 2 1 2 71 2 7 2 


43 40 43 46 4.7 49 48 4.6 4.7 45 43 45 5.0 5.3 5.1 5.0 5.0 5.1 
4145 45 44 44 43 4.7 4.5 4.5 4.7 4.7 5.1 4.8 5.2 48 5.2 4.7 4.9 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in mean strength levels between 
vats. Use a = 0.05. 

(d) Test whether there are differences in mean strength levels between 
barrels within vats. Use a = 0.05. 

(e) Test whether there are differences in mean strength levels between 
samples within barrels. Use a = 0.05. 
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(f) Estimate the variance components of the model and determine 95 
percent confidence interval on them. 

3. Consider an experiment designed to study heat transfer in the molds 
utilized in manufacturing household plastics. A company has two 
plants that manufacture household plastics. Two furnaces are randomly 
selected from each plant and two molds are drawn from each furnace. 
The response variable of interest is the mold temperature, and five 
temperatures are recorded from each mold. The data from test results 
of the experiment are given as follows. 


Plant 1 2 
Furnace 1 2 1 2 
Mold 1 2 1 2 1 2 1 2 


Temperature (°C) 468 473 474 475 481 481 480 480 


(a) Describe the model and the assumptions for the experiment. It 
is assumed that the effect due to plant is a fixed effect whereas 
furnace and mold are random factors. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in mean temperature levels 
between the plants. Use a = 0.05. 

(d) Test whether there are differences in mean temperature levels 
between furnaces within plants. Use a = 0.05. 

(e) Test whether there are differences in mean temperature levels 
between molds within furnaces. Use a = 0.05. 

(f) Estimate the variance components of the model and determine 95 
percent confidence intervals on them. 

4. Bliss (1967, p. 354) reported data from an experiment designed to 
investigate variation in insecticide residue on celery. The experiment 
was carried out on 11 randomly selected plots of celery which were 
sprayed with insecticide and residue was measured from plants se- 
lected in three stages. Three samples of plants were selected from 
each plot and one or two subsamples were selected from each sample. 
Finally one or two independent measurements on residue were made 
on each subsample. The following data refer to a subset of 6 plots that 
have been randomly selected from 11 plots in the experiment. 
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Plot 1 2 3 
Sample 1 2 3 1 2 3 1 2 3 


Subsample 1 2 1 2 141 134 2 1% 2 1% 1% 2 1 #2 «1 


Residue 0.52 040 0.26 0.54 0.52 0.18 0.31 0.13 0.25 0.10 0.52 0.55 0.33 0.26 0.41 
0.43 0.52 0.24 0.29 0.66 0.40 


Plot 4 5 6 
Sample 1 2 3 1 2 3 1 2 3 


Subsample 1 2 1 2 141 134 2 131% 2 7% 3% 2 134 2 «21 


Residue 0.77 0.51 0.44 0.50 0.44 0.50 0.60 0.60 0.71 0.92 0.24 0.48 0.53 0.50 0.39 
0.56 0.60 0.67 0.53 0.36 0.30 


Source: Bliss (1967, p. 354). Used with permission. 


(a) Describe the model and the assumption for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in mean residue levels between 
plots. Use a = 0.05. 

(d) Test whether there are differences in mean residue levels between 
samples within plots. Use a = 0.05. 

(e) Test whether there are differences in mean residue levels between 
subsamples within samples. Use a = 0.05. 

(f) Estimate the variance components of the model and determine 
95 percent confidence intervals on them. 

5. Anderson and Bancroft (1952, p. 333) reported the results of an exper- 
iment designed to study some of the factors affecting the variability 
of estimates of various soil properties. The experiment was conducted 
on 20 fields by sampling two sections from each field. Two samples 
consisting of a composite of 20 borings were taken from each section 
and finally two subsamples were drawn from each sample. The data 
were analyzed for several soil properties and the following table gives 
an analysis of variance for the magnesium data. 


Analysis of Variance for the Magnesium Data 


Source Degrees Mean Expected Mean 
of Variation of Freedom Square Square 
Field 0.1809 
Section 0.0545 

(within fields) 
Sample | 0.0080 

(within sections) 
Subsample 0.0005 


(within samples) 
Source: Anderson and Bancroft (1952, p. 333). Used with permission. 


(a) State the model and the assumptions for the experiment. 
(b) Complete the missing columns of the preceding analysis of vari- 
ance table. 
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(c) Test whether there are differences in mean levels of magnesium 
between samples within sections. Use a = 0.05. 

(d) Test whether there are differences in mean levels of magnesium 
between sections within fields. Use a = 0.05. 

(e) Test whether there are differences in mean levels of magnesium 
between fields. Use a = 0.05. 

(f) Estimate the variance components of the model and determine 95 
percent confidence intervals on them. 

6. Anderson* and Bancroft (1952, pp. 334-335) described an experiment 
designed to test various molds for their efficacy in the manufacturing 
of streptomycin. A trial experiment to assess variability at various 
stages of production process is to be run. There are five stages in 
the production process: The initial incubation stage in a test tube, 
a primary inoculation period in a petridish, a secondary inoculation 
period, a fermentation period in a bath, and the final assay of the 
quantity of streptomycin produced. The number of test tubes to be 
used at different stages are as follows: a = 5, b = 2,c = 2, d = 2, 
and n = 2, giving a total of 80 assays for the final analysis. Let o2, 
Opa): oF 8)» Ose) and oZ be the variance components associated with 
the five stages of the production process; and consider the following 
analysis of variance table. 


Analysis of Variance for the Streptomycin Production Data 


Source Degrees of Mean Expected 
of Variation Freedom Square Mean Square 
Incubation stage MS, 
Primary inoculation MS 2a) 

(within incubation stage) 
Secondary inoculation MSc:z) 


(within primary inoculation) 

Fermentation MS pc) 
(within secondary inoculation) 

Final assay MS<£,p) 
(within fermentation) 

Error MSE 


(a) State the model and the assumptions for the experiment consid- 
ering all effects are random. 

(b) Complete the missing columns of the preceding analysis of vari- 
ance table. 

(c) Determine algebraic expressions for the estimates of the variance 
components as functions of the mean squares. 


* Dr. R.L. Anderson first proposed this design for an experiment conducted at the Purdue University 
in 1950. Used with permission. 


8 Partially Nested 
Classifications 


8.0 PREVIEW 


In the preceding chapters, we discussed classification models involving several 
factors that are either all crossed or all nested. Occasionally, in a multifac- 
tor experiment, some factors will be crossed and others nested. Such designs 
are called partially nested (hierarchical), crossed-nested, nested-factorial, or 
mixed-classification designs. For example, suppose that in a study involving an 
industrial experiment it is desired to test three different methods of a produc- 
tion process. For each method, five operators are employed. The experiment 
is carried out over a period of four days and three observations are obtained 
for each combination of method, operator, and day. Because of the nature of 
the experiment, the five operators employed under Method I are really individ- 
uals different from the five operators under Method II or Method III, and the 
five operators under Method II are different from those under Method III. The 
physical layout of such an experiment can be depicted schematically as shown 
in Figure 8.1. In this experiment, the days are crossed with the methods and 
operators, and operators are nested within methods. 


8.1 MATHEMATICAL MODEL 


Consider three factors A, B, and C having a, b, and c levels, respectively. Let 
b levels of factor B be nested under each level of A and let the c levels of factor 
C be crossed with a levels of factor A and b levels of factor B. The model for 
this type of experimental layout can be written as 


Vigne = UG + By t+ V+ OV dix 


i 
+ (BY) juin + Cecijey ; (8.1.1) 
£ 


II 
— — — 
i) NN i) 
= A @Q 


where yz is the general mean, a; is the effect due to the i-th level of factor A, 
Bq) 1s the effect due to the j-th level of factor B within the i-th level of factor 
A, Yx 1S the effect due to the k-th level of factor C, (wy ),, 1s the interaction of the 
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pays 
2 3 4 


1q11 
2(IID) 
3(1I1) 
411) 
SII) 


FIGURE 8.1 A Layout for the Partially Nested Design Where Days Are Crossed 
with Methods and Operators Are Nested within Methods. 


i-th level of factor A with the k-th level of factor C, (By) ;,,;) 1s the interaction 
of the j-th level of factor B with the k-th level of factor C within the i-th level 
of factor A, and e¢(;,) 18 the usual error term. Notice that no A x B interaction 
can exist, because the levels of factor B occur within different levels of factor 
A. Similarly, there can be no three-way interaction A x B x C. 

Under Model I, the @;’s, Bjiy’s, (@);,’s, and (By ) ;,,;)’s are constants subject 
to the restrictions: 


Se = Ve = 0, 

i=l k=1 
Yd @y)ix = YS @v iz = 0, 
i=1 k=1 


i 
>> Bia = 0 for each i, 


j=1 


. 


b 
(BY) jxg) =O for each (i, k), 
j=) 


>_ BY)ixiy = 9 for each j(i), 
k=1 
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and the ég;;,)’s are uncorrelated and randomly distributed with mean zero and 
variance o2. However, for a fixed k, the (By) jk S do not sum to zero over i 
for a fixed /. 

Under Model II, we assume that the a;’s, Bj(iy’S, Y's, (@Y )ix’Ss (BY) jxqiy 8 
and €¢(; ;x)'S ad uncon e and randomly distributed with zero meals ang vari- 
ances a, O Bia)? oy, C605 By(ay» and o2, respectively. Thus, 7, O5ia)> oy, Cae 

G5 Hays and o? are the variance components of the model (8.1.1). 

Various types of mixed models are possible and their assumptions can ana- 

logously be stated. For example, with A and C fixed and B random, we assume 


that a;’s, y,’s, and (ay);,’s are constants subject to the restrictions: 
a c 
Ya; = iS Ve = 9, 
i=1 k=l 
a Cc 
> (ay), = > (ay), = 0. 


i=] k=] 


Furthermore, the Bjqiy’s, (BY) jxq'S, and eecijxy’S ate randomly distributed with 
zero means and variances 04,4); OBy(a)> and o2, respectively; and the three 
groups of random variables are pairwise uncorrelated. The random effects 
(By); jk(i) ’s, however, are correlated due to the restrictions: 


Cc 
> (BY) ici) = 0, for each Ji). 
k=1 


8.2 ANALYSIS OF VARIANCE 


The identity corresponding to the model (8.1.1) is 


Yijke — Y... = Ci... — YW) + Giy.. — Vind + Ou — Y.z..) 
OVE = i= VRE YL) ORR iS JR VED 
+ (vijke — Yijx.): (8.2.1) 


Note that the terms on the right-hand side of (8.2.1) are the sample estimates of 
the terms on the right-hand side of the model (8.1.1) excluding the grand mean. 
The first and third terms are similar to main effects in a crossed classification 
model (5.1.1). The second term is analogous to an ordinary nested term such as 
the second term in (6.3.1). The fourth term is an ordinary two-way interaction 
similar to the fifth term of (5.3.1). The fifth term can be obtained by considering 
it as the difference between yj;j;,, and the term obtained as the general mean 
y... + the factor A effect (i.e., y;.. — y...) + the factor B within A effect (i.e., 
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Vij. — yi...) + the factor C effect (i-e., ¥.%. — ¥....) + the A x C interaction (..e., 
Vik. — Vi... — Yk. + Y....); that 1s, 


Vise = Wie One Oe = i PO A) 
+ (Vir. — Vi — Yi. + Y.DI 
= Vik =i = ie as (8.2.2) 


Alternatively, partially hierarchical models can be looked upon as degenerate 
cases of completely crossed models. For example, suppose that the B effect is 
fully crossed with A, so that there will be a B main effect yy; — y.... and 
an A x B interaction y;;,. — yj... — ¥;.. + ¥.... Now, noting that the B effect 
is not really a main effect and combining it with its interaction with A, we 
obtain 


V9) OR. Sie SH ie) = is Sis (8.2.3) 


which is precisely the second term on the right-hand side of (8.2.1). Simi- 
larly, if B were a crossed effect, then it would have an interaction with C, 
and its interaction with A would also have an interaction with C. But since 
B is not really a crossed effect, these two interactions are combined to ob- 
tain 


Vj = 95> Va FY.) 
+ OR Ji = GEIS OD) 
= Vijr. — Vij. — Vir. + Yi... (8.2.4) 
which is equivalent to (8.2.2). 
The same reasoning also holds in the determination of the degrees of freedom. 
For the B within A effect, each level of A contributes b — 1 degrees of freedom 
and since there are a levels of A, the total number of degrees of freedom are 


a(b — 1). However, using the argument of (8.2.3), the degrees of freedom would 
be 


(b—1)+(a—- 16-1) =a(b—- 1), (8.2.5) 

which gives exactly the same value. For the B x C within A interaction, since C 

has c — 1 degrees of freedom and B within A has a(b — 1) degrees of freedom, 

their interaction will have a(b—1)(c—1) degrees of freedom. From the argument 
of (8.2.4), the degrees of freedom will be 

(b— 1)\(c —1)+ (a— 1b- 1)(e -— 1) = ab —- 1c — 1), (8.2.6) 


which again gives the same result. 
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Now, performing the operations of squaring and summing over all indices of 
(8.2.1), we obtain the following partition of the total sum of squares: 


SS; = SS, + SS 3(A) + SSc + SSac + SSBc(A) + SSz, 


where 
a b Cc n 
SS7 = > (vijke — ¥....)" 
t=t. Jal k=! f=) 
SS4 = ben ) (i... — 5...) 
=A 
SSpcay = cn > Y0y.- Vi)’ 
i=l] j= 
SSc = abn or — yy, 
k=1 
SSac = bn > Six. — Vi. - Vat, 
i=] k=1 
SSBcca) = 72 Ou - Viz. — Via. + Hi... 
f=1 jal k= 
and 


a b 
SS-_ = 2 » (yijne — Vijkd’- 


The corresponding mean squares are denoted by MS4, MSa.a), MSc, MSac, 
MSzgcia), and MS; respectively. The expected values of mean squares can 
be derived as before. Bennett and Franklin (1954, pp. 410-427) give a gen- 
eral procedure for obtaining the expected values in partially nested classifi- 
cations. The resultant analysis of variance is shown in Table 8.1. The proper 
test statistic for any main effect or interaction of interest can be obtained from 
an examination of the analysis of variance table. The variance components 
estimates are obtained by equating mean squares to their respective expected 
values and solving the resultant equations for the corresponding variance com- 
ponents. 


Remark: In a partially nested situation, it is useful to remember the following rule of 
thumb for calculating the degrees of freedom. The number of degrees of freedom for a 
crossed-factor is one less than the number of levels of the factor; for a nested factor the 
number of degrees of freedom is equal to the product of the quantity above multiplied 
by the number of levels of all the factors within which it is nested. 
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8.3 COMPUTATIONAL FORMULAE AND PROCEDURE 


The following formulae may be used for calculating the sums of squares: 


a Cc 


b 
SST = o> Si nae 


i=1 j=l k=1 l=1 


ls ee y? 
SS —— ee 2 —_ sieere : 
A ben 8 Yi... abcn 


1 a b 1 a 
SSin = Fe eee e % 
B(A) pay. Yij.. ben ae 
i=l j=l i=1 
e+ ee y? 
CC ce oF es ge 
- abn Zz Yk. abcn 


] = ; 
Sac = Fe De ag Le ape kt a 


i=] k=1 k=1 


1 b oe c¢ 
SSacca) = = D> vie ~ =LL%. md Yee. 


i=1 j=l k=1 Wet t= 


a 


and 


b 
1 
SSse=) Yijke ~ = 


Examining the forms of SSg,4) and SSgcva), we notice that these formulae can 
be written as 


a 


1 ~Z 1 
SS = pace Pe eye Ge 
B(A) y 13 pe ag ee | 


=| 


and 


ae 1 be l 
SSac(a) = > : > Nive a oe Yin. ba > Yin + it. 


i=] j=l k=1 j=l k=1 


q 


Thus, SSg 4) can be obtained by first calculating the sums of squares among 
levels of B for each level of A, then pooling over all levels of A; and SSgcva) 
can be obtained by first computing the sums of squares among levels of B and C 
for each level of A, then pooling over all levels of A. Their degrees of freedom 
can also be determined in a similar manner. 
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8.4 A FOUR-FACTOR PARTIALLY NESTED CLASSIFICATION 


In this section, we briefly outline the analysis of variance for a four-factor 
partially nested classification. Consider four factors A, B, C, and D having 
a, b, c, and d levels, respectively. Let b levels of factor B be nested under 
each level of A, let c levels of factor C be nested under each level of B, and 
let d levels of factor D be crossed with a levels of A, b levels of B, and c 
levels of C. The model for this type of experimental layout can be written 
as 


a 
F=1,...,D 
ijkem = bh + 0; + Byay + Yeajy + be + (dig J JO 
aa LOE RG ENCE Net e AT) 
+ (BS) pe) + VO )gecizy + Cmcijke) e d 
n 


where the meaning of each symbol and the assumptions of the model are readily 
stated. Starting with the identity 


Yijkem — Y.... = (Wi... — ¥) + iz... — Vi.) + Dijk. — Vij...) 
+ (V0. = Wud A it. — Vi + Ye $Y...) 
+ (Vij.t. — Vij... — Vie. + Yi...) 
+ (Vijke. — Vijk.. — Vij.e. + ij...) 
+ (Vijkem — Yijxe.)s 


the total sum of squares is partitioned as 
SS7 = SS4 + SSaay + SSccay + SSp + SSav + SSBp(A) + SScr(B) + SSE, 


where 


ooeee 


SS3B(A) = cdn s Y Wir. - Vi.) 
a bee 


SSca) =dn > ye > Ciik. aay Vij. 


i=] j=l k=1 
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l=] 
a d 


= 
a b d 
SSppca) = en » Y_Cize. — ij. — Fit +5. 
2 Fatal 
a b Cc d 
SScp(s) =n » > Y(iine — ijk. — Vije. + ij.) 


i=] j=1 k=1 @=1 


and 
d 
> (Vijkem — Vijee.)’> 


with the usual notations of dots and bars. The corresponding mean squares de- 
noted by MSaz, MS ava), MSc,z), MSp, MSap, MS ppv); MScp,a); and MS- 
are obtained by dividing the sums of squares by the respective degrees of free- 
dom. The resultant analysis of variance is summarized in Table 8.2. The proper 
test statistic for any main effect or interaction of interest can be obtained from 
an examination of the analysis of variance table. The variance components es- 
timates as usual are obtained by solving the equations obtained by equating the 
mean squares to their respective expected values. 


8.5 WORKED EXAMPLE FOR MODEL Il 


Schultz (1954) discussed the results of an analysis of variance performed on 
data on the calcium, phosphorous, and magnesium content of turnip leaves. The 
data were obtained as follows. “Duplicate [microchemical] analyses were made 
on each of four randomly-selected leaves from each of four turnip plants picked 
at random... . Duplicate determinations were made on each ash solution from 
a particular leaf..... The analyses of the two sets of ash solutions were made 
-at different times.” The analysis of variance for the calcium data are given in 
Table 8.3. 

It is evident from the structure of the experiment that the plants are crossed 
with ashings and leaves are nested within plants. Since both plants and leaves 
within plants were randomly selected, both these factors should be regarded as 
random. In addition, the factor ashing should also be assumed as random inas- 
much as two ashings might be regarded as coming from repeated experiments 
on the same leaves (a new random sample from each leaf might be taken at 
future periods, ashed, and then analyzed in duplicate). 
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TABLE 8.3 
Analysis of Variance for the Calcium Content of Turnip Leaves Data 
Source of Degrees of Mean Expected 
Variation Freedom Square Mean Square F Value p-Value 
Plants 3 6.202154 07 +203. 4) +4 x 203, 
+2 x 20} q) +4 x2 x 2aZ 
Leaves 12 0.605917 07 +2054) +2 2054 43.38  <0.001 
(within plants) 
Ashings I 0.02945 of + 208.4) +4 x 2og, 0.88 0.417 
+4x4x 20? 
Plants x Ashings 3 0.033569 o2 + 205 4a) +4x 205, 2.40 0.119 
Ashings x Leaves 12 0.013968 07 + 203.4) 3.92  <0.001 
(within plants) 
Error 32 0.003560 a2 
Total 63 


Source: Schultz (1954). Used with permission. 


The mathematical model for the experimental design would be 


Vigne = M+ 0; + Byiy + Ve + (OY diz + (BY) jaciy + Ceci : 
£ 


where ju is the general mean, a; 1s the effect of the i-th plant, 6; is the effect 
of the j-th leaf within the i-th plant, y; is the effect of the k-th ashing, (wy)jx 1s 
the interaction of the i-th plant with the k-th ashing, (By) jx i) 1S the interaction 
of the j-th leaf with the k-th ashing within the i-th plant, and eg(;;x) 1s the 
customary error term (analysis in duplicates). Under the assumption that all the 
effects are random, the a;’s, Bjiiy’s, Ye’S, (@Y )ix’S, (BY) jxciy’S, aNd egcijxy’S are 
normally distributed with zero means and variances 07, f(a), Os gy» Thy (ay 
and o?, respectively. 

The test of the null hypothesis that the variance component due to a particular 
source is zero can be based on the ratio of the mean square of the source to 
that mean square whose expectation is the same as the expectation of the mean 
square being tested except for the component due to the source of variation 
being tested which is equal to zero under the null hypothesis of no effect. 
Thus, the ashings x leaves (within plants) interaction is tested against the error 
mean square and the difference is highly significant (p < 0.001). Similarly, the 
plants x ashings interaction is tested against ashings x leaves (within plants) 
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and has probability of occurrence greater than 10 percent due to chance alone 
(p =0.119). Among main effects, ashings is tested against plants x ashings 
and is not found to be significant (p =0.417); leaves (within plants) is tested 
against ashings x leaves (within plants) and is found to be highly significant 
(p < 0.001). The test of significance for plants, however, does not have an exact 
test. As discussed in Section 5.5, an approximate test may be constructed using 
the test statistic 


7 MS, 
MSac + MSga) — MSacyay | 


/ 


which has an approximate F distribution with df, and v’ degrees of freedom 
where 


sie (MSac + MSavay — MSgcyay) 
(MSac) if (MS g(a)” * (MSgccay)” 
dfac Af BA) Af Bc(A) 


In the example, the values of F’ and v’ (rounded to the nearest digit) are found 
to be: 


_ 6.202154 hs 
~ 0.033569 + 0.605917 — 0.013968 


/ 


and 


(0.033569 + 0.605917 — 0.013968) 
yp=-=—— 
(0.033569) (0.605917)? (0.013968) 
3 a 12 12 


From Appendix Table V, it is found that the values as large as 9.92 at 3 and 13 
degrees of freedom occur in less than 1 percent of trials due to chance alone 
(p =0.001). Thus, there is strong evidence that plants are to be regarded as 
differing significantly in calcium content. 

As pointed out in Section 5.5, an alternate test for plants may be based on 
the statistic 


pr _ MSa+MSacuy 
MSac + MSava) 


which has an approximate F distribution with v; and vy degrees of freedom 
where 


Wo (MS, + MSacay)” 
' (MS4)?_— (MSaccay)’ 
af p Af gcc) 
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and 

» —  (MSac + MSgvay)? 

> (MSacy’ re (MS gay)? 

af ac af pa) 
Again, in the example at-hand, the values of F”, v;’, and vj (rounded to the 
nearest digit) are found to be: 
_ 6.202154 + 0.013968 
0.033569 + 0.605917 
» — (6.202154 + 0.013968)" 
' (6.202154) — (0.013968)? _ 
a 


w" 


= Oke. 


9 


and 


, (0.033569 + 0.605917)” 
"2 * (0.033569)? (0.605917. 
3 as 12 
These values are essentially the same as for the test statistic F’ and we reach 


exactly the same conclusion as before. 
Finally, the estimates of the variance components are give by 


6; = 0.00356, 


] 
q? 5 (0.013968 — 0.003560) = 0.00520, 


OBy(a) = 


ay” 


] 
C= 8 (0.033569 — 0.013968) = 0.00245, 


I 
6, = z5 (0.02945 — 0.033569) = —0.00013, 


] 
re ri (0.605917 — 0.013968) = 0.14799, 


and 


l 
= 16 (6.202154 — 0.033569 — 0.605917 + 0.013968) = 0.34854. 


Assuming that a, = 0, these variance components account for 0.7, 1.0, 0.5, 0.0, 
29.2 and 68.6 percent of the total variation. Thus, the largest single source of 
variability is attributable to variation between plants and may account for nearly 
70 percent of the total variation. Leaves within plants are also quite variable 
and may account for most of the remaining variation. Although the effect due 
to ashings x leaves (within plants) interactions is statistically significant, it 
accounts for only 1 percent of the total variation. 
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TABLE 8.4 
Measured Strengths of Tire Cords from Two Plants Using Different 
Production Processes 
Distance (yds) 
0 500 1,000 1,500 2,000 2,500 


ED |S 


Plant Bobbin 1 2 #1 2 #7 2 #7 2 1 2 #1 2 
1 11). =f <5. 22 =8 <2 Se et Oo eI 
2(1) 1 10 1 x 9 2 10 —-4 -4 3 4 8 
31) 2 3 5 -5 1 -1 -6 tL 2 5 7 5 
4(1) 6 10 1 5 0 5 2 —2 1 1 5 9 
B(1)> 0. SB. SS TOY AS) ed. RP SS 6 
61) -1 -10 -8 -8 -2 Se Seg: as ae 2g 
7(1) —9 —2 5 —2 7 —2 -—2 —2 -l 2 10 5 
8(1) 0 » 5 2 Ss 3 10 -1 4 1 7 -1 
2 1(2) 10 8 —5 6 2 13 7 +15 17 «+14 «+18 I 


22) 9 12 6 15 15 12 18 16 13 #10 9 U1 
3(2) 0 8 12 6 2 0 5 4 18 8 6 8 
42) 5 9 2 16 15 5 21 18 15 It 18 15 


52) -1 -1 lt 19 12 10 1 2 13 9 4 6 
62) 7 1 15 WW 12 12 8 12 22 I 12 21 
7(2) —5 1 2 10 12 1 2 #1 10 #10 #7 = 5 


8(2) 10 9 10 1 9 6 12 WW 1 20 WU 15 


Source: Akutowicz and Traux (1956, Table 1, p. 4). Used with permission. 


8.6 WORKED EXAMPLE FOR MODEL Ill 


Akutowicz and Traux (1956) described an experiment designed to investigate 
the variability of the strength of tire cord. Prior to the establishment of control of 
cord testing laboratories, data were obtained from two plants that used different 
production processes to make nominally the same kind of tire cord. A random 
sample of eight bobbins of cord was selected from each plant and six 500-yard 
intervals over the length of each bobbin were determined. In order to give as 
nearly as possible “duplicate” measurements, adjacent pairs of breaks were 
made at each interval measuring the recorded strength in 0.1 lb deviations from 
21.5 lb. The coded raw data are given in Table 8.4. 

It is evident that the structure of this experiment is somewhat different from 
the crossed and nested classification models discussed in the earlier chapters. 
If the bobbins were crossed with the plants, so that the first bobbin with plant 
1 had some correspondence with the first bobbin with plant 2, and the second 
bobbin in like manner, and so on, then one would have a three-way crossed 
classification with replication in the cells. However, clearly, this is not the 
situation. The bobbins are not crossed with plants; rather they are nested within 
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the plants since eight bobbins of cords were selected at random from each 
plant. Similarly, if the distances were obtained as random samples from each 
bobbin, so that they were nested within the bobbins, with no crossing between 
distance 1 and bobbins or plants, then one would have a completely nested or 
hierarchical classification. However, again, this is not so. In this experiment, the 
distances were chosen at 500-yard intervals over the length of each bobbin and 
thus constitute a fixed effect which is crossed with bobbins and plants. Thus, the 
experimental structure conforms to the partially nested or hierarchical design, 
described earlier in this chapter, where the distances are crossed with the plants 
and bobbins and bobbins are nested within plants. 
The mathematical model for the experimental design would be 


? 


os eee 

i eee 0 
Yijke = M+ 0; + Byiy + Ve + (AY dik + (BY) jeciy + Cec jry k=1.2.....6 
| a 


3 3 


where pu is the general mean, a; is the effect of the i-th plant, 6;(;) is the effect of 
the j-th bobbin nested within the i-th plant, ), is the effect of the k-th distance, 
(ay )jx 1S the interaction of the i-th plant with the k-th distance, (By) jx ;) is the 
interaction of the k-th distance with the j-th bobbin within the 7-th plant, and 
€g(ijk) 1S the customary error term. Furthermore, the a;’s, yg’s, and (ay )j,’s are 
fixed effects with the constraints: 


2 6 


6 
Yi =0, Yin =0, Yer. =0 Seri, =0: 


i=] k=1 i=] k=1 


and the Bji)’s, (BY) jeg) S, and €¢(;;,)’S are random effects that are independently 
and normally distributed with mean zero and variances OFta)? OF a)? and Ge. 
respectively. 

To calculate the sums of squares, we first form the bobbins within plants 
totals (y;;..), plant x distance totals (y;.x.), plant totals ();...), distance totals 
(y.x.), and grand total (y....), as shown in Table 8.5. The other quantities needed 
in the calculations of the sums of squares are: 


> _ (1,016)? 
2x8x6x2 °»#192 
2 


2 6 
SSL vue = (CD? + 1? ++ + (15)? = 15,788, 


= 5,376.333, 


f=) ysl kerf] 
1 2 8 6 (—6)* + (—10)? + --- + (26) 
i=1 j=1 k=1 
PAs, 9 (-31)?4 (35)? +--+ (156) 
Ss eS 347. 
wore 12 eens 
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TABLE 8.5 
Calculation of Cell and Marginal Totals 
Yijk. 
Distance (yds) 
Plant Bobbin 0 500 1,000 1,500 2,000 2,500 yi Yi... 
1 1(1) -6 —10 —7 -1 -8 -3i1 
2(1) 11 3 4 6 -l 12 35 
3(1) 1 0 0 —5 7 12 13 
4(1) 16 6 5 —4 2 14 39 31 
5(1)  -9 -5  —-4 ~3 0 9 —-12 
61) —-ll —16 0 -3 -9 -6 —45 
71) 9-11 3 5 —4 15 9 
8(1) 2 —7 8 9 5 6 23 
Yk 9 26 19-11 4 54 
2 1(2) 18 15 223 29116 
2(2) 21 21 27 3423 20s—s«*146 
3(2) 8 18 2 9 26 14 77 
4(2) 14 18 20 39 2 £33 150 985 
5(2)  -2 30 «22 21 22 10 ~—-:103 
6(2) 23 26 224 202 «33 33s«d9 
7(2)  —4 S. “27 15-20 12 78 
8(2) 19 25 25 233 «438 «~=— 6 Ss«d:56 
Yok 97 147 162 183 219 177 Y.. 
Yk 88 «121 s«181)—Ss172,—s—«i223s 28 1,016 


Se 898.290; 


1 Sy » _ (-9) +(-26)? +--- +177)" 
o 16 


1 3 » _ (31)? + (985) 


= 10,116.521, 
96 


sty? (88)? + (121) +--+ + 231) 
, a aa 


—_—_—_— = 5,869.375. 
2x8x2 32 


Now, the sum of squares for plants is an ordinary main effect; that is, 


y? 


2 
BPA errr a cas 


= 10,116.521 — 5,376.333 
= 4,740.188. 
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The sum of squares for bobbins within plants is an ordinary nested effect; that is, 


2 8 2 


| 2 | 2 
Saw = 9 2D Bed dy 
i) i= 
= 11,347.167 — 10,116.521 
= 1,230.646. 


The sum of squares for distance is again an ordinary main effect; that is, 


y 


1 6 
SSc = —————_ Bed eee as 
- KR BKD yh 2x8x6x2 


= 5,869.375 — 5,376.333 
= 493.042. 


The sum of squares for plant x distance is an ordinary two-way interaction; 
that is, 


2x8 6 x2 
= 10,888.250 — 10,116.521 — 5,869.375 + 5,376.333 
= 278.687. 


The sum of squares for distance x bobbin within plants is 


2 2 6 


1 2 8 6 
SHC 5) DD uk Sag Dd ue <a 


i=1 j=l k=1 i=1 j=l i=1 k=1 


1 =. 
13,697 — 11,347.167 — 10,888.250 + 10,116.52] 
1,578.104. 


The total sum of squares is 
2 8 6 2 y? 
= Be ag Aza pe cts Ee Sa 
Se 22 2 Ge 
= 15,788 — 5,376.333 
= 10,411.667. 
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Finally, the error sum of squares is obtained by subtraction as 


SSe = SSr — SS4 — SSac4) — SSc — SSac — SSacia) 
= 10,411.667 — 4,740.188 — 1,230.646 — 493.042 
— 278.687 — 1,578.104 
= 2,090.99. 


The complete analysis of variance is shown in Table 8.6. 

The plants, tested against bobbins within plants, is evidently highly signi- 
ficant (p < 0.001). Similarly, bobbins within plants, tested against the er- 
ror term, also seem to differ quite significantly (p < 0.001). Distances also 
appear to differ significantly, that is, have considerable effect on cord strength 
(p = 0.002). There is some evidence of interaction between plant and distance 
(p = 0.041). However, there does not seem to be any interaction between dis- 
tance and bobbin within plants, indicating that the effect of different distances 
is probably the same for all bobbins (p = 0.426). The variance components 
a2, Opa)? and 5a) are estimated as | 


6? = 21.781, 
1 
Spy) = 5 (22.544 — 21.781) = 0.382, 
and 
a0 I 
6 p(a) = 75 (87.903 — 21.781) = 5.510. 


It is evident from the preceding analysis that there is a great deal of variability 
in the strength of tire cord and the larger part of this variability arises due to 
differences in the manufacturing processes of the two plants. The distances 
also differ quite significantly. The variability between bobbins within a given 
plant is quite large and the duplicate measurements on adjacent pairs also differ 
considerably. 


8.7 USE OF STATISTICAL COMPUTING PACKAGES 


Among the SAS procedures, PROC ANOVA and PROC NESTED cannot be 
used to analyze a partially nested model since they are written for either com- 
pletely crossed or completely nested designs. PROC GLM is the procedure of 
choice for analyzing this type of model. Again, the analysis involving a random 
or mixed effects model can be handled via RANDOM and TEST options. PROC 
MIXED or VARCOMP can be used for the estimation of variance components. 
For instructions regarding SAS commands, see Section 11.1. 
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DATA STRENGTHS; The SAS System 

INPUT PLANT BOBBIN General Linear Models Procedure 

DISTANCE $ DUPLICA Dependent Variable: STRENGTH 

STRENGTH; Sum of Mean 

DATALINES; Source DF Squares Square F Value Pr > F 
1 OYD 1 -1 Model 95 8320.6666 87.5859 4.02 0.0001 
-5 rror 96 2091.0000 21.7812 


CYD 2 

OYD 1 1 Corrected 191 10411.6666667 

OYD 2 10 Total 
1 
2 


fea fea 


OYD 2 R-Square c.V. Root MSE STRENGTH Mean 
OYD -3 0.799168 88.196006 4,.6670387 5.29166667 
2G . 7 . Source Type TII SS Mean Square F Value Pr> F 
2 2500YD 2 15 PLANT 4740.1875 4740.1875 217.63 0.0001 
‘ BOBBIN (PLANT) 1230.6458 87.9033 4.04 0.0001 
PROC GLM; DISTANCE 493.0417 98.6083 4.53 0.0009 
CLASSES PLANT BOBBIN PLANT* DISTANCE 5 278.6875 55.7375 2.56 0.0322 
DISTANCE; BOBBIN*DISTAN(PLANT) 70 1578.1042 22.5443 1.04 0.4340 
MODEL STRENGTH = PLANT Source Type III Expected Mean Square 
BOBBIN(?LANT) DISTANCE PLANT Var(Ezrzor) + 2 Var (BOBBIN* DISTAN (PLANT) ) 
PLANT* DISTANCE +12 Var(BOBBIN(PLANT)) + Q( PLANT, PLANT* DISTANCE) 
BOBBIN* DISTANCE (PLANT); BOBBIN (PLANT) Var(Error) + 2 Var(BOBBIN* DISTAN (PLANT) ) 
RANDOM BOBBIN( PLANT) + 12 Var(BOBBIN (PLANT) )} 
BOBBIN* DISTANCE (?LANT) ; DISTANCE Var(Error) + 2 Var (BOBBIN* DISTAN (PLANT) ) 
TEST H=PLANT + Q(DISTANCE, PLANT* DISTANCE) 
E=BOBBIN (PLANT) PLANT* DISTANCE Var(Error) + 2 Var ({(BOBBIN* DISTAN (PLANT) ) 
TEST H=DISTANCE + Q(PLANT* DISTANCE) 
E=BOBBIN* DISTANCE (PLANT) BOBBIN* DISTAN (PLANT) Var(Error) + 2 Var (BOBBIN* DISTAN (PLANT) ) 
TEST H=PLANT* DISTANCE Tests of Hypotheses using the Type IIT MS for BOSBIN(PLANT) as an 
E=BOBBIN* DISTANCE (PLANT); | error term 
RUN; Source DF Type III SS Mean Square F Value Pr > F 
CLASS LEVELS VALUES PLANT 1 4740.1875 4740.1875 53.93 0.0001 
PLAN? 2 4 Tests of Hypotheses using the Type TIT MS for BOBBIN* DISTAN (PLANT) 
BOBBIN 8 as an error term 
Source DF Type III SS Mean Square F Value Pr > F 

DISTANCE 6 DISTANCE 5 493.0416 98.6083 4.37 0.0016 

OYD 500YC 1000YD Tests of Hypotheses using the Type IIT MS for BOBBIN*DISTAN(PLANT) 
1500YD 2000YD 2500YD as an error term 
NUMBER OF OBS. IN DATA Source DF Type III SS Mean Square F Value Pr > F 
SET=192 PLANT* DISTANCE 5 278.6875 55.7375 2.47 0.0403 


tp op 


(i) SAS application: SAS GLM instructions and output for the partially nested mixed 
effects analysis of variance. 


DATA LIST Tests of Between-Subjects Effects Dependent Variable: STRENGTH 
/PLANT 1 
BOBBIN 3 Source Type III SS df Mean Square 
DISTANCE 5 PLANT Hypothesis 4740.188 1 4740.188 
DUPLICA 7 Error 1230.646 14 87.903 (a) 
STRENGTH 9-11. BOBBIN (PLANT) Hypothesis 1230.646 14 
BEGIN DATA. Error 1578.104 70 -344 (b) 
-1 DISTANCE Hypothesis 493.042 5 . 608 
-5 Error 1578.104 70 -344 (b) 
1 PLANT* DISTANCE Hypothesis 278.688 5 .738 
10 Error 1578.104 70 .344(b) 
2 BOBBIN* DISTANCE Hypothesis 1578.104 70 -544 
-3 (PLANT) Error 2091.000 96 21.781 (c) 
aaeee é a MS(BOBBIN(PLANT)) b MS(BOBBIN* DISTANCE (PLANT)) c MS (Error) 
62 15 
END DATA. Expected Mean Squares (a,b) 
STRENGTH BY Variance Component 
PLANT Source Var (B(P)) Var (B* D(P)) Var (Error) Quadratic Term 
BOBBIN PLANT 12.000 2.000 1.000 Plant 
DISTANCE BOBBIN (PLANT) 12.000 2.000 1.000 
DUPLICA DISTANCE . 000 2.000 1.000 Distance 
/DESIGN PLANT PLANT* DISTANCE . 000 2.000 1.000 Plant* Distance 
BOBBIN (PLANT) BOBBIN* DISTANCE (PLANT) -000 2.000 1.000 
DISTANCE Error .000 .000 1.000 
PLANT* DISTANCE a For each source, the expected mean square equals the sum of the 
DISTANCE* BOBBIN |coefficients in the cells times the variance components, plus a quadratic 
(PLANT) term involving effects in the Quadratic Term cell. b Expected Mean Squares 
/RANDOM BOBBIN. are based on the Type III Sums of Squares. 


(ii) SPSS application: SPSS GLM instructions and output for the partially nested mixed 
effects analysis of variance. 


FIGURE 8.2 Program Instructions and Outputs for the Partially Nested Mixed 
Effects Analysis of Variance: Measured Strengths of Tire Cords from Two Plants 
Using Different Production Processes (Table 8.4). 
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FILE='C:\SAHAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\ TEXTO\ EJE22.TXT!. - EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 
VARIABLES=2. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 
/VARIABLE NAMES=REP1,REP2. SOURCE ERROR SUM OF D.F. MEAN F PROB. 
/DESIGN NAMES=PLANT, DEISTANCE, TERM SQUARES 
BOBBIN, REPLIC. MEAN B(>) 5376.333 : , .0000 
LEVELS=2, 6,8,2. PLANT B(P) 4740.188 ‘ . .0000 
RANDOM=BOBBIN, REPLIC. DISTANCE DB(P) 493.042 : 7 .0016 
FIXSD=PLANT, DISTANCE. B(P) R(PDB) 1230.646 7 Ee -0000 
MODEL='P,B(P),D,R(PBD)'. PD DB(P) 278.688 : : -0403 
DB (P) R(PDB) 1578.104 : . -4343 
R (PDB) 2091.000 


SOURCE EXPECTED MEAN ESTIMATES OF 
SQUARE VARIANCE COMPONENTS 

MEAN 192 (1) +12 (4)+(7) -543912 
PLANT 96(2)+12(4)+(7) -46129 
ANALYSIS OF VARIANCE DESIGN DISTANCE 32 (3) +2(6)+(7) - 37700 
INDEX B(P) 12 (4)+(7) -91017 
NUMBER OF LEVELS PD 16(5)+2 (6)+(7) -07457 
POPULATION SIZE INF INF DB (P) 2(6)+(7) ~38155 
MODEL F, B(P), D,R(PBD) R( PDB) (7) -78125 


(iii) BMDP application: BMDP 8V instructions and output for the partially nested mixed 
effects analysis of variance. 


FIGURE 8.2 (continued) 


Among the SPSS procedures, either MANOVA or GLM could be used for 
the analysis involving random and mixed effects models. For the estimation of 
variance components, SPSS VARCOMP will be the procedure of choice. For 
instructions regarding SPSS commands, see Section 11.2. 

Among the BMDP programs, 3V or 8V can be used for partially nested 
designs. For designs with balanced structure, 8V is preferable; 2V can also be 
used but the special methods of combining crossed factor sums of squares must 
be used for obtaining sums of squares corresponding to nested factors. 


8.8 WORKED EXAMPLE USING STATISTICAL PACKAGES 


In this section, we illustrate the application of statistical packages to perform 
partially nested analysis of variance for the data set of the example presented 
in Section 8.6. Figure 8.2 illustrates the program instructions and the output 
results for analyzing data in Table 8.4 using SAS GLM, SPSS GLM, and BMDP 
8V. The typical output provides the data format listed at the top, all cell means, 
and the entries of the analysis of variance table. Note that the results are the 
same as those provided using manual computations in Section 8.6. 


EXERCISES 


1. An experiment is designed to study the performance of three differ- 
ent lathes. Each lathe has three different speeds where the product 
is manufactured and each was operated at two different feed rates. 
The runs are made in random order and three observations are taken 
from each speed. The relevant data in certain standard units are as 
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follows. 
Lathe 1 TT Hl 
Speed 1 2 3 1 2 3 1 2 3 
Feed rates 
Low 41.2 408 43.3 39.2 402 39.9 399 40.9 40.7 


374 41.9 43.9 406 418 428 40.1 405 39.9 
38.7 42.1 442 41.1 409 414 40.2 398 38.8 


High 31.4 35.2 32.8 31.2 31.2 33.1 31.3 303 31.8 
33.4 374 33.2 32.1 32.2 342 33.2 345 29.1 
34.2 36.7 31.9 334 349 30.9 324 33.1 31.9 


(a) Describe the model and the assumptions for the experiment. It 
is assumed that all three factors are fixed. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in lathes. Use a = 0.05. 

(d) Test whether there are differences in feed rates. Use a = 0.05. 

(e) Test whether there are differences in speeds. Use a = 0.05. 

(f) Suppose that a large number of feed rates are available, and 
the two taken for the experiment are selected randomly. Modify 
the analysis of variance in part (b) to give the expected mean 
squares for this case and estimate the variance components of 
the model. 


. An experiment is designed to study the microhardness of high- 


strength steel purchased from three different foundries. Each foundry 
supplied the steel in three different lengths of bars: 3.0, 3.50, or 4.0 
inches. Inasmuch as the production of different lengths of bar from 
a common ingot required different extrusion techniques, this fac- 
tor may be important. Moreover, the bars were forged from ingots 
produced at different temperatures. Each foundry provided two test 
specimens of each bar from three different temperatures. The result- 
ing data in certain standard units are as follows. 


Foundry A i I 


Temperature °C °C ne 
1100 1200 1300 1100 1200 1300 1100 1200 1300 


Bar Length (in.) 


3.0 1.841 1.957 1.846 1.912 1.957 1.926 1.858 1.886 1.935 
1.869 1.911 1.817 1.874 1.993 1.931 1.897 1.879 1.926 
3.5 1.927 1.919 1.861 1.885 1.995 1.957 1.884 1.871 1.993 
1.911 1.973 1.849 1.879 1.986 1.968 1.875 1.876 1.975 
4.0 1.898 1.957 1.884 1.858 1.973 1.947 1.912 1.891 1.929 


1.893 1.993 1.826 1.826 1.939 1.953 1.873 1.882 1.934 
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(a) Describe the model and the assumptions for the experiment. It 
is assumed that all three factors are fixed. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in the microhardness of steel 
purchased from the three foundries. Use a = 0.05. 

(d) Test whether there are differences in the microhardness of steel 
of different lengths of bar. Use a = 0.05. 

(e) Suppose that bars may be acquired in many lengths and the three 
lengths being used in the experiment were selected randomly. 
Make the necessary modifications in the analysis of variance in 
part (b) to reflect the expected mean squares for this situation 
and estimate the variance components of the model. 

3. Brownlee (1965, p. 544) reported data from an experiment invelving 
five laboratories that participated in measuring the brightness of six 
lamps of each of two types. The brightness of each lamp type was 
measured in all five laboratories. The data on values of candle power 
measured at different laboratories are as follows. 


Laboratory* 
Type Lamps A B C D E 
I 1 741 768 770 772 738 
2 731 763 755 742 724 
3 731 763 757 760 728 
4 759 779 775 774 752 
) 738 758 750 750 730 
6 770 795 800 800 768 
i 1 625 650 655 651 615 
2 590 611 605 625 588 
3 602 630 640 630 605 
4 578 607 640 608 581 
) 578 604 605 608 573 
6 625 673 670 664 631 


* All figures have been multiplied by 100 and then 1,000 subtracted from them. 
Source: Brownlee (1965, p. 544). Used with permission. 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in laboratories. Use a = 0.05. 

(d) Test whether there are differences in lamp types. Use a = 0.05. 

(e) Test whether there are differences in lamps within types. Use 
a = 0.05. 

(f) Determine 95 percent confidence limits for the difference be- 
tween lamp types. (It is assumed that the laboratory x lamp type 
interaction is negligible.) 
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(g) Determine 95 percent confidence limits for the difference be- 
tween laboratory A and laboratory E. (Again it may be assumed 
that laboratory x lamp type interaction is negligible.) 

(h) Determine 95 percent confidence limits for the difference be- 
tween lamp type [in laboratory A and lamp type II in laboratory 
B. 

(1) Estimate the component of variance for lamps within types. 

4. Brownlee (1965, p. 545) reported data from an experiment involving 
five laboratories that took part in a test comparison of their mea- 
surement procedures for evaluating the impact strength of a type of 
fiberboard. Panels from two batches of board were tested by each 
of the five laboratories for each batch in duplicate on three days. 
The three days reported in the experiment were different for each 
laboratory. The data on impact strengths are as follows. 


Laboratory 
Day Batch A B C D E 
1 1 1483 1449 1499 1428 1509 


1496 1400 1472 1401 = 1439 


2 1504. 1465° 1506 1407 1480 
1505) 1423) 1537) «1416s: 1429 


2 1 1441 1477 1483 1404 = 1416 
1416 1471 1509 1419 144] 


2 1477) 1418) = 1578) 1455s: 1364 
1457 «©1445 1486) 1435 144] 


3 1 1450 1446 1489 1414 1419 
1478 1398 1435 1446 = 1444 


2 1435, 1424 1499 1423 1437 
1478 1426 1491 1442 = 1438 


Source: Brownlee (1965, p. 545). Used with permission. 


(a) Describe the model and the assumptions for the experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are differences in laboratories. Use a = 
0.05. 

(d) Test whether there are differences in days. Use aw = 0.05. 

(e) Test whether there are differences in batches. Use a = 0.05. 

(f) Determine 95 percent confidence limits for the mean difference 
between laboratories A and E. 

(g) Estimate the within day component of variance averaged over 
laboratories. 
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(h) Estimate the between day component of variance averaged over 


laboratories. 


5. Desmond (1954) reported the results of an experiment in testing the 
operation of voltage regulators involving four setting stations. From 
each of these stations, three regulators were randomly selected and 
each was tested at four test stations. The data are given as follows. 


Setting Regulator 
Station No. 1 


1 16.5 
15.9 


16.9 


16.7 
17.0 
16.3 


17.0 
16.6 
16.3 


16.8 
16.1 
16.2 


ON =| WAN = |DNHD = WHND = 


Test Station 
2 3 
16.1 16.2 
15.4 15.8 
15.9 16.0 
16.1 15.7 
16.4 16.4 
16.1 16.1 
16.1 15.8 
16.3 15.9 
15.9 16.2 
16.7 16.3 
16.0 16.0 
16.1 16.1 


Source: Desmond (1954). Used with permission. 


(a) Describe the model and the assumptions for the experiment. 
Assume a fixed effect model for setting stations and test sta- 


tions. 


(b) Analyze the data and report the analysis of variance table. 
(c) Test whether there are differences in test stations. Use a 


0.05. 


(d) Test whether there are differences in setting stations. Use a 


0.05. 


(e) Test whether there are differences in regulators within setting 


stations. Use a = 0.05. 


(f) Present a table of the means of each setting station with estimated 


standard error for each mean. 


(g) Determine 95 percent confidence limits for the mean differences 


between test stations | and 4. 


6. Consider the experiment described 1n the worked example in Section 
8.5. The analysis of variance for the phosphorous data 1s performed 
in exactly the same manner as for the calcium data and the results 


are as follows. 
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Analysis of Variance for the Phosphorous Content of Turnip 
Leaves Data 


Source of Degrees of Mean Expected 
Variation Freedom Square Mean Square 
Plants 3 0.056375 of + 20%, @) + 4 x 203, 


+2 x 20} +4 x 2x 203 


Leaves 12 0.035786 =o? +: 265, q) +2 x 205) 
(within plants) 

Ashings 1 0.000467 62 +. 20%, 4) +4 x 202, 

+4x4x 202 

Plants x Ashings 3 0.000664 0? + 20}, +4 x 204, 

Ashings x Leaves 12 0.000935 ae + 20 aes 
(within plants) 

Error 32 0.000457 a? 

Total 63 


Source: Schultz (1954). Used with permission. 


(a) Test whether there are differences in effects due to plants. Use 
a = 0.05. 

(b) Test whether there are differences in effects due to leaves within 
plants. Use a = 0.05. 

(c) Test whether there are differences in effects due to ashings. Use 
a = 0.05. 

(d) Test whether there are differences in effects due to plants x ash- 
ings. Use a = 0.05. 

(e) Test whether there are differences in effects due to ashings x 
leaves within plants. Use a = 0.05. 

(f) Estimate the variance components of the model and determine 
95 percent confidence intervals on them. 


7. Anderson (1954) reported the results of an experiment designed to 


compare the absorption properties of ceramic compositions. There 
are 15 ceramic compositions and the experiment was performed 
under three different temperatures. Two batches of each composi- 
tion were prepared and two firings were made at each temperature. 
Finally, one observation was made for each firing of a batch, giving 
a total of 180 observations. The mathematical model for this exper- 
iment would be: 


oS) 


~~ 


Yijke = + 0; + By) + Ve + Sere) + (OY dik 
+ (a9 Jiecey + (BY) jc) + Ceciiey 


J 

J 
peanasd 
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Yd 


SS aN Be om, 
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ee a ee 
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where jz is the general mean, a; is the effect of the i-th tempera- 
ture, Bj) 1s the effect of the j-th firing within the i-th temperature, 
Vx 1s the effect of the k-th composition, 5%) 1s the effect of the €-th 
batch within the k-th composition, (ay )j;x 1s the interaction of the 7-th 
temperature with the k-th composition, (@5)j¢z) 1s the interaction of 
the i-th temperature with the £-th batch within the k-th composition, 
(BY) jxG) 18 the interaction of the j-th firing with the k-th compo- 
sition within the i-th temperature, and é¢(;;,) 18 the customary error 
term. Note that it is assumed that there are no (B98) jez) interactions. 
Under the assumption that all effects are random, the a;’s, Bj(y’s, 
Ve°S, Secny’S, (AY )ik’S, (5)iecRy’S, (BY) jxciy’S, aNd egg jxy)’S are nor- 
pus per OUie with eos Zero and variances Ox Ra)? oy Oh)» 
Say> Fas(y)? Tpy(ay and o,,, respectively. The analysis of variance ta- 
ble is given as follows. 


Analysis of Variance for the Ceramic Compositions Data 


Source of Degrees of Mean Expected 
Variation Freedom Square Mean Square 
Temperatures 2 1,179.9900 a2 + 205 (a) + 2oZ) 


+40%, + 305(q) + 6007 


Firings 3 0.1521 62 +263 4) + 300} 
(within temperatures) 
Compositions 14 10.3400 07 + 203. + 205) 


+4of, + 605, + 1207 


Batches 15 0.7405 of + 20%5ty) a 605 ,) 
(within compositions) 

Temperatures x 28 1.1130 a2 + 208 (a) + 26d) 
Compositions a 403, 

Temperatures x Batches 30 0.0857 oa + 202 5y) 


(within compositions) 
Firings x Compositions 42 0.0818 of +205 (a) 
(within temperatures) 
Error 45 0.0631 a2 


e 


Source: Anderson (1954). Used with permission. 


(a) Test whether there are differences in absorption properties among 
different temperatures. Use a = 0.05. 

(b) Test whether there are differences in absorption properties among 
firings within temperatures. Use aw = 0.05. 

(c) Test whether there are differences in absorption properties among 
compositions. Use a = 0.05. 
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(d) Test whether there are differences in absorption properties among 
batches within compositions. 

(e) Test for the following interaction effects: temperatures x com- 
positions, temperatures x batchs (within compositions), and 
firings x compositions (within temperatures). 

(f) Determine the point and interval estimates for each of the vari- 
ance components of the model. 

8. Consider a variation of the four-factor partially nested classification 
described in Section 8.4, where now b levels of factor B are nested 
under each level of A and d levels of factor D are nested under each 
level of C; that is, model (8.4.1) is now given by 


Vijkem = KM + a; + Bjay + Ve + Sey + (@Y Dik 
+ (a5) peck) + (BY) jay + Cmcijxe) 


Soa eu. 
Il 

— a a a 

s> Qa &OQ 
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where the meaning of each symbol and the assumptions of the model 

are readily stated. Note that it is assumed that certain second- and 

higher-order interactions are zero. 

(a) Describe the assumptions of the model under Models I, I, and 
III. Under Model III assume that factors A and C are fixed and 
B and D are random. 

(b) Develop the analysis of variance including expected mean squ- 
ares under the assumptions of Models I, II, and III. 

(c) Describe appropriate F tests for the effects of factors A, B, C, 
and D and their interactions for all the three models assuming 
normality for the random effects. 

9. Consider a three-factor study where factors A and B are crossed and 
factor C is nested within factors A and B. Further, suppose that A, 
B, and C are all random having a, b, and c levels respectively and 
there are n replications in each cell. The mathematical model for this 
type of layout would be 


Yijke = M+; + Bj + (QB); + Vecijyy + Cece) 


a li Pa 
| 


where @; is the effect of the i-th level of factor A, 6; is the effect of the 
j-th level of factor B, (wf); ; is the interaction between the i-th level 
of factor A and the j-th level of factor B, 7,,;;) is the effect of the k-th 
level of factor C within the combination of the i-th level of actor A and 
the j-th level of factor B, and e¢(;;,) is the customary error term. It 1s 


Partially Nested Classifications 459 


assumed that the random effects a;’s, Bji)’s, (@B)i;’S, Yeujy’S, and 
€e(ijk)’S are all mutually and completely uncorrelated random vari- 
ables with zero means and variances o2, o?, zg, Oo ap) and o? re- 
spectively. The overall sum of squares is partitioned as 


SSr = SS,4 + SSg + SSaz + SSccasy) + SSE, 


SScc(as) =n De 2 Sisk. — yi)’, 


and 


ion) 
= 


b 
SSe=) >>> (Vigne — Yin.) 


Finally, the analysis of variance table for this model is shown as 


follows. 

Source of Degreesof Sumof Mean Expected 

Variation Freedom Squares Square Mean Square 

Factor A a—1 SSA MS, oa? + NO ap) + cnoJ, + bcno2 

Factor B b-1 SSB MSz a2 + nO ap) + cnoJ, + acno, 
2 2 2 

Interaction A x B (a—I1)(b-—1) SSap MS a4B Og + NO) apy + cnogp 

Factor C ab(c—1) — SSccasy MScrasy 9% +20; 48) 

(within A and B) 
Error abc(n—1) SSE MSgE a2 


(a) Develop the results on expected mean squares using the rules 
given in Appendix U. 

(b) Assuming normality, determine the tests of hypotheses for 
testing the effects corresponding to factors A, B, C, and the 
interaction A x B. 
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(c) Determine the estimators of the variance components based on 
the analysis of variance procedure. 

(d) Repeat parts (a) through (c) for a mixed model analysis where 
A is fixed and B and C are random. 

(ec) Repeat parts (a) through (c) for a mixed model analysis where 
A and B are fixed and C is random. 

Consider a four factor study where B is nested within A, D is nested 

within A, B, and C; and A and C are crossed. Further, suppose that A, 

B,C, and D are all random having a, b, c, and d levels respectively 

and there are n replications in each cell. The mathematical model for 

this type of layout would be 


Vijkem = + aj + Bjiy + VE + (AY dix : 
+ (BY) jx) + Secijky + mci jke) e 
m 


I 
Le ce ce ce 
NNNNN 
Soh 6 Ss 


we 


where q; is the effect of the i-th level of factor A, Bj(j) 1s the effect 

of the j-th level of factor B within the i-th level of factor A, y; is the 

effect of the k-th level of factor C, (ay),,; 1s the interaction between 

the i-th level of factor A and the k-th level of factor C, (By) jxciy 18 

the interaction between the j-th level of factor B and the k-th level 

of factor C within the i-th level of factor A, dei jx) 18 the effect of 
the 2-th level of factor D within the combination of the i-th level of 

factor A, the j-th level of factor B, and the k-th level of factort C, 

and €yn(jjxe) 1S the customary error term. It is assumed that the random 

effects a;’s, Bjiy’S, Ye’S, (AY )jx’S, (BY) jxciy’S» Secijey Ss ANA Cmiijxey’S 
are all mutually and completely uncorrelated random variables with 

zero means and variances 07, O4¢)1 Fy» Fay» FBy(a)> Fiapyy? and a; 

respectively. 

(a) Develop a partitioning of the total sum of squares corresponding 
to four main effects (including two nested), the two interactions, 
and a residual term. 

(b) Report the analysis of variance table including expected mean 
squares. 

(c) Assuming normality, determine the tests of hypotheses for test- 
ing the effects corresponding to factors A, B, C, and D, and the 
interactions A x D and B x D (within A). 

(d) Determine the estimators of variance components based on the 
analysis of variance procedure. 

(e) Repeat parts (b) through (d) for a mixed model analysis where 
factors A and C are fixed and B and D are random. 


Finite Population and 
Other Models 


9.0 PREVIEW 


As discussed earlier, so far in this volume we have been primarily concerned 
with random effects models or Model II based on the infinite population theory, 
that is, when the treatments included in the experiment are assumed to be a 
random sample from a population of treatments having infinite size or when 
the experimenter selects the levels at random from a large number of possi- 
ble levels of a factor usually considered as infinite. However, as described in 
Section 1.4, there are situations when the treatments selected may be a sample 
from a finite population and then the assumptions of an infinite population may 
be inappropriate. For example, in a large laboratory, there could be a total of 
10 analysts and the data obtained on just three of them could be used to make 
inferences concerning a new method for the determination of arginine content 
as used by the entire group of 10 analysts. 

The finite population model is also of interest because if we let the population 
sizes go to infinity, then we obtain Model IJ, and if we decrease the population 
size until it equals the sample size (so that the sample comprises the entire 
population), then we obtain Model I. If some population sizes are increased to 
infinity while others decreased to sample sizes, we are in a Model III situation. 
Under finite population models, the calculations for sums of squares, degrees 
of freedom, and mean squares remain the same. The difference lies in the 
derivation of expected mean squares and consequently in the estimation of the 
parameters and the testing of hypotheses. In this chapter, we briefly present 
the results for the finite population models. These models were first considered 
by Tukey (1949 a, c, 1950), Cornfield and Tukey (1956), and Bennett and 
Franklin (1954, Chapter VII). The interested reader is advised to go over these 
references for a more thorough treatment of the topic. 


9.1 ONE-WAY FINITE POPULATION MODEL 


For a one-way classification, with a groups or a levels of factor A and n ob- 
servations per group, the mathematical model under finite population theory is 
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TABLE 9.1 

Analysis of Variance for Model (9.1.1) 

Source of Degreesof Sumof Mean Expected 
Variation Freedom Squares Square Mean Square 
Between a—1 SS MSz o7 +no2 
Within a(n — 1) SSw MSw oa? 

Total an —1 SSr 


the same as (2.1.1), namely, 


i=1,2,...,a 
Vij =U Qj + Gj; a (9.1.1) 


where y;; is the value of the j-th observation in the i-th group. However, now 
the assumptions are as follows: 


(i) As before, jz is the constant general effect. 


(ii) There is a population of effects due to factor A of size A with mean zero 


and variance o2. The a;’s are assumed to be a random sample of size a 

from this population. We denote a particular level of A in the population 

by a,, where J] = 1,2,..., A. The a@,’s in the population satisfy the 
woe A 

condition }°7_, a7 = 0. 


(iii) Sampling is random in each group and independent among different 


groups. The e;;’s are a random sample of size n from an infinite popu- 
lation with mean zero and variance 0 


2 


e' 


(iv) We make the following definitions of population variances, that is, the 


and 


variance components of the model (9.1.1), 


a: = E(e;.). 


For the finite population model (9.1.1), the entire analysis of variance, includ- 
ing the sums of squares, mean squares, and expected mean squares, remains 
the same and is summarized in Table 9.1. Thus, in Table 9.1, if A =a, the 
definition of a2 corresponds to the Model I case of Table 2.1; and if A = oo, it 
corresponds to the Model II case of Table 2.1. 


9.2 TWO-WAY CROSSED FINITE POPULATION MODEL 


For a two-way crossed classification, with factor A having a levels, factor B 
having b levels, and n replications per cell, the mathematical model under finite 
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population is the same as (4.1.1); that is, 


i=z1,2,...,a 
Vijk = Uta; + Bj + (AB) + ei, YJ =1,2,...,0 (9.2.1) 
=] 2, Nn, 


where y,;, is the score of the k-th observation at the i-th level of factor A and 
the j-th level of factor B. However, underlying model (9.2.1), we now have the 
following assumptions: 


(1) 
(11) 


(iii) 


(iv) 


(Vv) 


(vi) 


As before, jz is the constant general effect. 

There is a population of main effects due to factor A of size A with mean 
zero and variance 02. The a@;’s are assumed to be a random sample of 
size a from this population. We denote a particular level of A in the 
population by w;, where / = 1,2,..., A. The a,’s satisfy the condition 
pa % = 0. 

There is a population of main effects due to factor B of size B with mean 
zero and variance op. The £;’s are assumed to be a random sample of 
size b from this population. We denote a particular level of B in the 
population by By where J = 1, 2,..., B. The B;’s satisfy the condition 
Dia py = 0. 

For each combination of a potential level of A with a potential level of B, 
there is a population of interaction effects of size A x B with mean zero 
and variance Olp. Selecting a particular J and a particular J determines 
the row and column and hence the cell that forms their interaction, and 
with this cell is associated the interaction (@B),,. The interaction terms 
satisfy the conditions: 


A 
>> (a@B);, =0, foreach J 
[=1 

and 


B 
Y > @B)11 = 0, for each J. 
J=1 


Sampling is at random in each cell and independent between different 
cells. The e;;,’s are arandom sample of sizen from an infinite population 
with mean zero and variance o?. 

We make the following definitions of the population variances, that is, 
the variance components of model (9.2.1): 


2 I : 2 
Oo =—— ) Or, 
7 A-1S! 

1 B 
2 _ 2 


1 A B 
2 9) 
P06 = (A — 1(B—1 dd, Bis 


I=1 J=1 
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TABLE 9.2 
Analysis of Variance for Model (9.2.1) 
Source of Degrees of Sum of Mean Expected 
Variation Freedom Squares Square Mean Square 
2 b\ 2 2 
Factor A a—1 SSA MS, of +nt1— 3 Oop + bno*, 
Factor B b-1 SSp MSs of + n( — A No +ano% 
Interaction (a—1)(b—1) SSAB MSaB of +NnoLg 
AxB 
Error ab(n — 1) SSE MS¢e a? 
Total abn — 1 SSr 
and 
2 2 


For the finite population model (9.2.1), the sums of squares and mean squares 
are the same as those shown in Table 4.2. However, the expected mean squares 
are those shown in Table 9.2. The derivation of expected mean squares involves 
some tedious algebra and can be found in Cornfield and Tukey (1956), Bennett 
and Franklin (1954, pp. 368-373), and Brownlee (1965, pp. 489-498). We 
simply mention here the results on the covariance structure of the a;’s, B;’s, 
and (wf); ;’s which are employed in finding expected mean squares. Thus, 


Covlai,a) = 22, i Xi 
Via;,,a) = —-——, 1 Y 
° A 
Op 
Cov(Bj,By)=—-z, JAI 
Cig . “pos + 
AB? iAl,jF#J 
1-—1/B 
Cov {(aB);;, (@B)ir} = oy ifi,j=s 
(1—1/A) , 


Sap i=i',jH#s’. 
Furthermore, because the a;’s and B;’s are selected independently, their covari- 
ances are zero. The values of (wf);;’s, on the other hand, in general depend on 
the i-th level of A and the j-th level of B and thus (@f);;’s are not independent 
of the w;’s and 8;’s in the sample. However, it can be shown that the covari- 
ance between a; and (@f)j'; is zero irrespective of whether i = i’; similarly, 
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the covariance between 6; and (@f);; is zero. Thus, we obtain the following 
results: 


Cov(a;, Bj) = 0, 
Cov(a;, (@B)i;) = 9, 
Cov(q@;, (@B);";) = 0, 
Cov(B;, (@B)i;) = 0, 


and 
Cov(B;, (@B)ij/) = 0. 


TESTS OF HYPOTHESES 


The results on expected mean squares provide a valuable guide to deciding 
which mean squares in the analysis of variance table are to be compared. For 
example, from Table 9.2, it is seen that a hypothesis test is only available to 
test for the existence of interaction in the general case. To develop tests for the 
main effects, we must limit ourselves to certain special cases described in the 
following: 


(i) In Table 9.2, if the samples of factors A and B levels correspond to 
the entire population, so that A = a, B = J, then the expected values 
of the mean squares are those given in Table 4.2 for Model I and the 
finite population model becomes exactly Model I. Hence, the tests for 
the main effects are obtained by dividing the mean squares by the error 
mean square. 

(ii) If the levels of factors A and B are infinitely large so that 1 — a/A 
and 1 — b/B both tend to 1, then all covariances approach to zero 
and the expected values of the mean squares are given exactly as in 
Table 4.2 for Model II and the finite population model becomes exactly 
Model II. Hence, the main effects are tested against the interaction mean 
square. 

(iii) If the a;’s are samples from an infinite population and the £;’s are 
samples from the entire population (1.e., A = oo and B = b), then the 
expected values of the mean squares are given exactly as in Table 4.2 for 
Model III and the finite population model becomes exactly Model III. 
Thus, the factor A effect is tested against the error and the factor B 
effect is tested against the interaction mean square. 


In the following we give the details of F tests of the hypotheses of interest for 
the general case of the finite population model including the point and interval 
estimation of the parameters. 
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F Tests 
An F test of the hypothesis 


Hp? : Cos =0 versus H/?: Oop > 0 
can be performed by using the statistic MS,g/MSz_. Now, to test 
HE : Op =Q versus HP : Of > 0, (9.2.2) 


Since there is no other mean square with this expectation, an exact F test for 
the hypothesis (9.2.2) is not available. However, it 1s possible to determine a 
conservative or an approximate F test. 

To obtain a conservative test, note that when oF = Q, the quantity 


Me /|o3 +n — «ods 
SEUSS (9.2.3) 


MS az /(o2 + NO sp) 


has an F distribution with b — 1 and (a — 1)(b — 1) degrees of freedom, res- 
pectively. It cannot, however, be used for testing the hypothesis (9.2.2), since 
the expressions involving unknown parameters do not cancel out and therefore 
the statistic (9.2.3) cannot be evaluated. However, if we change the coeffi- 
cient of 02, from n(1 — 4) to n by neglecting a/A, then of + noZ, cancels 
out from both the numerator and the denominator, and (9.2.3) reduces to the 
statistic 


Fe = MSg/MSazs, 


which is readily evaluated. Note that in this way we have reduced the value of 
the statistic (9.2.3), and, thus, 1f we compare it with F [b — 1, (a — 1)(b — 1); 
1 — a], we have made it harder to reject the null hypothesis. Such a test is 
conservative, since if we try to construct a test with significance level a, we 
actually have one with level < a. When A is sufficiently large compared to 
a, this test is generally adequate. In addition, we can also develop a psuedo- F’ 
test for the hypothesis (9.2.2) by using a linear combination of the mean squares 
whose expected value is equal to o2 + n(1 — £)ogp- We note from Table 9.2 


Finite Population and Other Models 467 
that E[(1 — 4)MSyz + (4)MSz] = 02 + n(1 — 4)oZ,. Hence, the suggested 
F statistic 1s 
MSp 
1-2 \Msap+(2)Mse 
A AB A E 


which has an approximate F distribution with b — | and v, degrees of freedom, 
where v, is approximated by 


(1 * )MS +(<)Ms ) 
—~ AB ~~ E 
yea A, NAY OO (9.2.5) 


2 2 
(- <\ (MS4o) *) (MSr)" 
“Ta-1\b—-1) *~ abn—-l~ 


Fi, = (9.2.4) 


Analogous tests — conservative as well approximate — can be constructed for 
the hypothesis H;! : 02 = 0 versus H/ : o2 > 0. 


a 


POINT ESTIMATION 


The parameter pu is clearly estimated by 


The other parameters of interest are the variance components o?, ods , Of, and 
o2. From Table 9.2, on equating mean squares to their respective expected 
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values and solving for the variance components, we obtain 


6? = MSz, (9.2.8) 
; 1 
6p = — (MSaz — MS), (9.2.9) 
l a a 

a2 

— —]MS, —(1—— )MSaz — —MSe], 9.2.10 
of =| B ( =) AB | c ( ) 

and 
p2 _ | MS 1 D MS ° MS (9.2.11) 
«~~ bn| 4 B AB BET - 


INTERVAL ESTIMATION 


To find a confidence interval for 4, we note that there is no entry in the mean 
square column of Table 9.2 which is a multiple of Var(y,..) given by (9.2.6). 
Thus, we cannot find an exact confidence interval for 4. However, as earlier, 
we can determine an approximate confidence interval using the Satterthwaite 
procedure. Thus, from (9.2.6) and (9.2.7), we have 


~x*[v], 
o24n(1—2) (1-2) 02, 4an (1-2) o240n(1—-%)o2 
; B Op + a B 0% n a2 
where 
ms = ~2.ms 1-2) (,-2)ms.,+(1-2)Ms 
~ AB A B AB B 8 
a 
+ ( — + )MS« (9.2.12) 
and v is calculated from 
v= — aI ne —___(MSy TTI AON AT OND 
~ ¢ ab \* (MSg) a \2 b\2 (MSapy* b \2 (MSp) a\2(MSa)*° 
(a5) aba tna) (1-3) a 1Nb—1) (1-3) = (1-4) ai 
(9.2.13) 
Thus, 
yw TE approx ~ f¢[v], (9.2.14) 


Finite Population and Other Models 469 


and a 1 — @ level confidence interval for jz is given by 


y.. £t[v, 1 —a/2].MS/abn. (9.2.15) 


These limits are approximate because of the approximation involved in (9.2.14). 

Now, an examination of the mean square and expected mean square columns 
of Table 9.2 indicates which variance components or their functions can be 
easily estimated by a confidence interval. Thus, as in the case of the infinite 
population model, a 100(1 — w) percent confidence interval for o? is given by 


b(n — 1 b(n — 1)M 
__ab(n = MSE 2< _ a(n = DMS (9.2.16) 
x*[ab(n — 1), 1 — a@/2] xX*[ab(n — 1), @/2] 

Furthermore, 


(a — 1)(6 — 1)MSap 
5 © 7 Ea - 1b - 1) 

Oe + NO gg 
and a 1 — a level confidence interval for 02 + Node is given by 


(a — 1)(6 — 1)MSap 
x7[(a — 1)(6 — 1), 1 — @/2] 


(a — 1)(b — 1)MSap 


x7[(a — 1)(b — 1), a@/2) 
(9.2.17) 


2 2 
<0, TNOgg < 


Likewise 


(6 — 1)MSez 


ry rs nes x’ [b-1] 
o2 ma _ A ea + ano2 
e A ap B 


and a 1 — @ level confidence interval for 02 + n(1 — {ods + ano; is 
given by 


(b — 1)MSz ; a\ , »  (b—1)MS=, 
ee eas —— aa ate 
~Ib—1,1—-e/2] °° +n 5) 2 Fang < 71, a/2] 

(9.2.18) 


Similarly, a 1 — a@ level confidence interval for o2 + n(1 — 5 )oz + bno? is 
given by 


(a —_ 1)MS, 
x2[a — 1,1 —a/2] 


(a — 1)MS 4 
x*[a — 1, 0/2] 
(9.2.19) 


b 
<o2+n(1~ 3) 02, + no? < 
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Unfortunately, as noted in earlier chapters, the expressions in the expected 
mean square column are the only quantities for which we can find exact confi- 
dence intervals in this manner. Thus, there do not exist exact confidence intervals 
for O5p» Op» and a2, and we have to resort to some approximate results. There 
are various methods for obtaining approximate intervals (see, e.g., Bennett and 
Franklin, 1954, Chapter VII). In the following, we give a method for obtaining 
a conservative confidence interval in the sense that the confidence level is at 
least 1 — aw. Inasmuch as variance components are nonnegative, it is possible 
to obtain a conservative confidence interval by simply deleting the undesired 
terms (nuisance parameters) in the usual confidence intervals obtained by us- 
ing the chi-square table. For example, from (9.2.17), we can delete the term 
containing o2 and it will yield a conservative 100(1 — w) percent confidence 


interval for O58 as 


(a — 1)(6 — 1)MSap ) (a — 1)(b — 1)MSap 
ah) ~~ Fab © a he) a)" 
nx*[(a — 1)(6 — 1), 1 —a@/2] nx*[(a — 1)(6 — 1), @/2] 

(9.2.20) 


Similarly, from (9.2.18) and (9.2.19), one can obtain conservative 100(1 — a) 
percent confidence intervals for Of and o2 given by 


b — 1)MS b — 1)MS 
0 —ne __ <0} < _@— DMSe (9.2.21) 
anx2[b —1,1—a/2] anx?[b — 1, a/2] 
and 
— 1)MS — 1)MS 
(a — DMS 2 (a = DMs (9.2.22) 


$c rr < 
bnx2[a —1,1—a/2] * — bnx? [a — 1, a/2] 


9.3. THREE-WAY CROSSED FINITE POPULATION MODEL 


The finite population model for the replicated three-way crossed classification is 
a natural generalization of the two-way crossed finite population model (9.2.1). 
Thus, with factor A at A levels, factor B at B levels, factor C at C levels, and 
n replications per cell, the model equation may be written as 


i 
Yijke= MW +a; + Bj + Ve + (@B)ij +OVin + (BY) je | 
+ (@BY)i jx + Cijxe k= 
£ 


(9.3.1) 


where the terms on the right-hand side of equation (9.3.1) have the familiar 
meanings and correspond to the general mean; main effects due to factors 
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A, B, C; interactions Ax B, Ax C, Bx C, Ax Bx C; and the error term, 
respectively. 

The assumptions of the finite population model (9.3.1) are stated in a way 
similar to those for the finite two-way model (9.2.1). Thus, for example, we 
suppose that the aw;’s are a random sample of size a from a population of size 
A, and the 8;’s and y,’s are random samples of sizes b and c from popula- 
tions of sizes B and C, respectively. In addition, in the population, the various 
parameters sum to zero over each index; that is, 


A B C 
d1= 2B = Di yK =0, 
Jal K=1 
C 


A A 
>> @B)1 =e = devin => @r)ik 


I=] J=1 I=] K=1 


B 
=) BY)ixK = » (BY ix =0, 
J=]1 


A 
> OBY)r7K ~S > (@BY ry = > (aByY) 7K = 9; 
J=1 K=1 


T=] 


(9.3.2) 


and we make the following definitions of the population variances, that is, the 
variance components of model (9.3.1): 


A 
a2 = qty da: etc., 


A B (9.3.3) 
Cap = ~ (A—1\(B—1) IVE —1) > » (af); ;, etc., 
and 
) ] A B C 
aby = Th DINB DC aD Dy Dy Le BNI 


The sums of squares and mean squares are the same as those shown in Table 5.2. 
The expected mean squares, which can be derived by an extension of the method 
used for the two-way model (see, e.g., Cornfield and Tukey (1956)), are dis- 
played in Table 9.3. 

Again, it is readily seen that when A = a, B = b, and C = c, the expected 
values of the mean squares are those given in Table 5.2 for Model I and the finite 
population model becomes exactly Model I. When A, B, and C are all infinite, 
then the expected values of the mean squares are those given in Table 5.2 for 
Model I and the finite model becomes exactly Model II. When A is infinite, 
B = b,andC =c, we have a Model III situation where factor A is random and 
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TABLE 9.3 
Expected Mean Squares for Model (9.3.1) 
Source Expected Mean Square 
b c c b 
A oa? raf — ale — = Jody, + on(1 — « )od, +en(1 — x )od + beno2 
B ao? +n(1 — “\(1 — = )odsy +an(1 — = )oRy + en(1 — A oi + acno% 
b b 
Cc a? +n(1 — “\(i — 7 ods, + an(1 — 7 oh, + on(1 — «od, + abno* 
c 
AxB oa? taf — « Jody, + cnoz. 
AxC 2a nl 1— La + bno2 
of +n B )Caby + oneay 
BxC 24n(1—-—)o2, + ano? 
Ve A) OobY ane By 


factors B and C are fixed; and the case when A and B are infinite and C = c, 
gives a Model III with factors A and B as random and factor C as fixed. 


9.4 FOUR-WAY CROSSED FINITE POPULATION MODEL 


The finite population model for the four-way classification is the obvious ex- 
tension of two and three-way finite population models and we survey it only 
briefly. Thus, with factor A at A levels, factor B at B levels, factor C at C levels, 
factor D at D levels, and n replicates per cell, the model is 


=1,..., 
Yijktm = M+ Oj + By + Ve + Oe + (OB) + OY Dix : =1,.. b 
+ (@d)ie + (BY) jx + (BS) je + Ve + OBY ijk Jp 1 
+ (@BS);5¢ + (AVS) ize + (BYS) ize + COBY 9S); jxe e= 1 : d 
+ Ci jkem m= - n 
(9.4.1) 


where the terms on the right-hand side of equation (9.4.1) have familiar mean- 
ings. For example, the a@;’s are a random sample of size a from a population of 
size A and sum to zero in the population; for example, 


A 
) a; = 0; 
I=1 
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with corresponding results for the other main effects. Similarly, the two-way 
interactions sum to zero in the population over each index; for example, 


A B 
> (aB)77 = > (aB)7) = 0, 
I=1 J=1 


and so on. The three-way and four-way interactions also sum to zero in the 
population over each index; for example, 


A B C 
SY) @BY)raK = Yd @BY 1K = > (aBY rox = 9, 
I=1 J=!1 K=1 
and so on, and 
A B C 
Y-@BYS)yK1 = > BYS) KL =), OBYS KL 
I=] J=1 K=1 


D 
= >> @BYS) 1s KL = 0. 
L=1 


a 
The sums of squares and mean squares are the same as defined in Section 5.11. 


The expected mean squares can again be derived by an extension of the method 
used for the two- and three-way finite population models. Thus, for example, 


b Cc d 
E(MS,) =o2+n ( — i) (1 — ) (1 — 5] Obys 
b Cc b d 
+nd (1 —~ 7 (1 — « Jods + nc (1 — i) ( — 5) Oops 
Cc d b 
+ nb( — =) ( — 5) Oey 3 + ncd ( — i) OEg 


d 
+-nbd( _ = )o3, + nbc (1 — 5) o2,+nbcdo-, etc., 


Cc d Cc 
E(MS,z) = 0) + n( — =) (1 — 5] Oupys + nd(1 — = )o, 


d\ 4 2 
+ne}1— D ) Cab + ncdoyg, etc., 


2 —2 42 2 5; ; 
We define 07, O46, Spy» Sapys &tc., in an analogous manner as in (9.3.3). 


d 
E(MS,apzc) = oa? +n ( — 5) Cssys +ndoi5,, etc., 


and 


E(MSascp) = 0, + NO i py 5- 
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9.5 NESTED FINITE POPULATION MODELS 


Similar to the crossed classification models, we can develop finite population 
models for the nested classification. We briefly consider here the two-way nested 
finite population model. Thus, corresponding to the infinite population model 
(6.1.1), we have 


i=1,2,...,a 
Vijk =U +a; + Biiy +enijy 1 J =1,2,...,0 (9.5.1) 
k=1,2,...,n, 


9 


where the usual assumptions of the finite population model (9.5.1) are as fol- 
lows: 


(i) As before, pz is the constant general effect. 
(ii) There is a population of a,;’s of size A with ar a, =O and a;’s are 
random samples of size a from this population. 

(iii) Associated with each / is a population of By(s)’s of size B. For each of 
these populations of size B, we have the condition that vy Bia) =9 
foreach J = 1, 2,..., A. The Bj()’s are random samples of size b from 
these populations. It should be noted, however, that for each value of 
I, the entire set of B B,(;)’s sum to zero; but, in general, Bj(;)’s do not 
sum to zero for the sample b unless b = B. Also, the Bj,7)’s do not, in 
general, sum to zero within a row; that is, )-7_, By ¥ 0. 

(iv) The ex ;)’s are a random sample of size n from an infinite population 
with mean zero and variance o?. 

(v) We make the following definitions of the finite population variances or 

the variance components of model (9.5.1): 


and 
of = E(eaj)- 


Again, the sums of squares and mean squares are the same as those given in 
Table 6.2. However, the expected mean squares are those shown in Table 9.4. 
The derivation of the expected mean squares follows the same general approach 
of the crossed situation and can be found in Bennett and Franklin (1954, pp. 
358-363). Note that in Table 9.4 if both factors A and B constitute the en- 
tire population (i. e.. A = a and B = Db), then the expected values of the 
mean squares become identical to those given in Table 6.2 for Model I. If both 
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TABLE 9.4 
Analysis of Variance for Model (9.5.1) 
Source of Degrees of Sumof Mean Expected 
Variation Freedom Squares Square Mean Square 
b 
Factor A a—1 SSA MS, a2 +n ( - 7 og +bnoz 


Factor B within A a(b — 1) SS B(A) MS aa) oa? + nok 


Error ab(n—1) SSg MSe a? 
Total abn — | SSr 


A = 00, B = ov, we get Model IJ; and the case with A = a and B = & gives 
Model III. 

The finite population model (9.5.1) can be extended similarly to higher-order 
nested classifications. 


9.6 UNBALANCED FINITE POPULATION MODELS 


In the preceding sections, we have dealt mainly with finite population models 
having balanced sampling. The details on the models involving unequal sam- 
pling can be found in the papers of Gaylor and Hartwell (1969) and Searle and 
Fawcett (1970). 


9.7 WORKED EXAMPLE FOR A FINITE POPULATION MODEL 


Consider an industrial experiment involving 3 machines and 4 operators. Ma- 
chines were randomly selected from a set of 10 machines and operators were 
chosen at random from a group of 12 available operators. Three observations 
were made on each of the 12 machine-operator combinations and the data on 
production output are given in Table 9.5. 

This is an example of a two-way crossed finite population model with replica- 
tion where both machines and operators are randomly selected from populations 
involving only a finite number of elements. The mathematical model for this 
experiment would be 


l 
Yijk =U1+G+ Bj) + (OB)ij +e. YJ = 1,2, 
k=1,2 


where yp is the constant general effect, a; is the effect of the i-th machine, 6; is 
the effect of the j-th operator, (w);; is the interaction effect of the i-th machine 
with the j-th operator, and e;;, is the customary error term. Furthermore, it is 
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TABLE 9.5 
Production Output from an 
Industrial Experiment 


Operator 
Machine 1 2 3 4 
1 26.3 26.0 25.7 25.0 


26.9 25.2 26.0 25.3 
27.2 24.6 26.2 25.0 
2 26.7 26.0 26.2 25.5 
27.0 26.4 26.1 24.7 
26.9 26.6 27.3 26.0 
3 26.8 26.6 26.5 26.2 
27.0 27.0 27.5 25.5 
27.2 26.9 27.0 25.7 


assumed that the a@;’s are a random sample of size 3 from a population of a,’s 
that satisfy the condition Sry a; =O; B;’s are a random sample of size 4 
from a population of B,’s that satisfy the condition yy B,; =90; (@B);;’s 
are arandom sample from a population of (wB),,’s that satisfy the conditions 
Spy (aB);; =O for each J and Wy (@B);, =0, for each J; and e;;,’s are 
a random sample of size 3 from an infinite population with mean zero and 
variance oa. The population variances of the a;’s, B;’s, and (wB),;’s are defined 
as follows: 


2 | ° 2 
"a * nt Yh 
2 ! - 2 
B = 12-1 2 Fi, 
and 
l 10 12 
8 = GOD DIO Le 2 Bis: 


I=1 J=1 


The analysis of variance computations for degrees of freedom, sums of squares, 
and mean squares are performed as in the case of an infinite population model 
and the results are shown in Table 9.6. 

Assuming normality we can test the hypotheses of interest using the results 
shown in Table 9.6. The results on expected mean squares provide a valuable 
guide to deciding which mean squares are to be compared. The interaction 
hypothesis Hj’? : oj, = 0 versus H;*" : 07, > is tested by comparing the ratio 
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TABLE 9.6 
Analysis of Variance for the Production Output Data of Table 9.5 
Source of Degreesof Sumof Mean Expected 
Variation Freedom Squares Square Mean Square 

4 
Machine 2 4.6250 2312 o2+4+3 (1 - =) Oig +4 x 302 
Operator 3 10.3364 3.445 o2+3 1-4 ),2 + 3x 302 
Interaction 6 1.6261 0.271 of +30f, 
Error 24 4.5000 0.188 2 
Total 35 21.0875 


0.271 /0.188 = 1.44 with the percentile of the theoretical F distribution with 
(6, 24) degrees of freedom which is not significant (p= 0.241). To test the 
hypothesis regarding the presence of a main effect due to operator (i.e., Hy : 
o2 =0 versus H? Of > 0), we notice that there does not exist an exact F 
test. However, as noted in Section 9.2, a conservative test can be performed 
by comparing the ratio 3.445/0.271 = 12.71 with the percentile of the theoret- 
ical F distribution with (3, 6) degrees of freedom which is highly significant 
(p =0.005). Similarly, a conservative test of the hypothesis regarding the pres- 
ence of a main effect due to machine (i.e., Hj’ : o2 = 0 versusH/' : o2 > 0), 
is performed by comparing the ratio 2.312/0.271 = 8.53 with the percentile 
of the theoretical F distribution with (2, 6) degrees of freedom which is more 
significant than the 2 percent level of significance (p = 0.017). In addition, we 
can also perform psuedo- F tests for the hypotheses considered previously. As 
discussed in Section 9.2, a psuedo- F test for the hypothesis Hy’ Oo, = 0 versus 
HP: Op > 0 is performed using the statistic 


MS; 
1—2)ms,,+(2)ms, 
A AB A E 


which has an approximate F distribution with 3 and v, degrees of freedom 


where 
a 2 
((1— 3) Sant (5) se) 
yp A NEY 
(1-4 (MSan)? (—) (MS;) 
A AB A E 


(a — 1)(b— 1) + ab(n — 1) 


Fi, = 


>| 8 
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In the example at-hand, F;, and v, (rounded to the nearest digit) are found to 
be 


3.445 
Fi, = ———____—""______ = 14.00 


3 3 
( = a) (0.271) + (=) (0.188) 


and 


((1- 3) 0.271 +(3) 0.188)) 


3) 2 (2 2 
( 3} (0.271) (3) (0.188) 
rs a 


/ 
Vv, = 


These values lead to essentially the same result as the conservative test obtained 
earlier with even higher significance (p < 0.001). Similarly, a psuedo-F test 
for the hypothesis H;! : o2 = 0 versus o7 > 0 is determined using the statistic 


MS, 
b b . 
1——)MSa5+(—)Ms 


which has an approximate F distribution with 2 and v, degrees of freedom 


where 
b b 2 
1——1]MS — |}MS 
y= (( 3) o+(5) ‘) 


a 


F,= 


b \’ b\’ 
( — 2 (MSap) B (MSz) 
“Ta=Db—-1) + abn—h~ 


Again, in the example at-hand, the values of F’, and v/, (rounded to the nearest 
digit) are found to be 


2.312 
Fi, = ———____"""___. = 9.50 


4 4 
(1 — =) (0.271) + (5) (0.188) 
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and 


1-+)o27 * ) 0.188 
(Qa) 027 + (a3) 0288) 
v= : =11. 


,- 4 “271 4 0.188) 
( 5 ) 5) 
ny <r a) 


These values also lead to essentially the same result as the conservative test 
obtained earlier with even higher significance (p = 0.004). 

Now, to assess the relative contribution of individual variance components, 
we may obtain their estimates using formulae (9.2.8) through (9.2.11). Thus, 
we find that 


6? = 0.188, 


l 
65, = — (0.271 — 0.188) = 0.028, 
13 


I 3 3 
2 

= —— |3.445 — (1 —- — } (0.271)— {| — ] 0.1 = 0.355, 
p | ° ( ak Gal 88) ° 


> 


and 


52 | 2.312 — 12 (0.271) — 4 (0.188) | = 0.172 
4x3 [7 12) °° 12) >" oe 


These components account for 25.3, 3.8, 47.8 and 23.1 percent of the total 
variation. The results are consistent with the tests of hypotheses performed 
earlier. 

We can further proceed to obtain confidence intervals for the variance com- 
ponents. To determine a 95 percent confidence interval for a2, we have 


MS, = 0.188, 7[24,0.025] = 12.397, and yx7[24,0.975] = 39.980; 


Substituting the values in (9.2.16), the desired 95 percent confidence interval 
for o2 is given by 


aS > 24x 0.188 


<of < ————- | = 0.95 
39.380 12.397 


Or 


P [0.114 < of < 0.363] = 0.95. 
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As noted in Section 9.2, there do not exist exact confidence intervals for 02 3° OR ; 
and o2; however, we can obtain their conservative confidence intervals. Using 
formulae (9.2.20) through (9.2.22), it can be verified that the conservative con- 
fidence intervals for Oxp> OR and a2 are given as follows: 


P[0.038 < of, < 0.437] > 0.95, 
P[0.123 < of < 5.220] > 0.95, 


and 
P[0.052 < a2 < 7.707] > 0.95. 


It should be remarked that the variance components are in general highly vari- 
able and the lengths of their confidence intervals given previoulsy attest to the 
fact that there is great deal of uncertainty involved in their estimates. 

Finally, we construct a confidence interval for the general constant yw. As 
noted earlier, there does not exist an exact confidence interval for this problem. 
However, we can obtain an approximate 95 percent confidence interval for 


jl as 
_ IMS 
abn 


where MS and v are defined as in (9.2.12) and (9.2.13), respectively. For the 
example at-hand, y.., MS, v (rounded to the nearest digit), and t[v, 0.975] are 
found to be 


y. = 26.242, 


Ms = (75) 0.188) - (1 a)(1-3) 0.271 
= \ Fo x12 } O18) ~ 10 ja ) 0-471) 


4 3 
+ (1 — =) (3.445) + (1 — =) (2.312) = 3.807, 


3x4 3 4 
3x4 \? ; 3 4 , 4 ; 3 
(5) (0.188) (1-3) ae (0.271) ay (3.445) = (2.312) 
HH Ht 


24 6 3 2 


t[5, 0.975] = 2.571. 


Finite Population and Other Models 481 


Substituting these values in the preceding formula the desired 95 percent con- 
fidence limits for 4 are determined as 


[3.807 
26.242 + 2.571 36.7 (25.406, 27.078). 


9.8 OTHER MODELS 


Throughout this volume, we have been concerned mainly with such terms as 
Models I, II, and III, or fixed, random, and mixed models, depending on the 
nature of the factors in the experiment. For Model I, it was assumed that for all 
factors the levels employed in the experiment make up the population of levels 
of interest. When the levels used in the experiment constitute a sample from an 
infinite population of levels, Model II is appropriate. A case involving at least 
one factor fixed and others random was termed as Model III. 

In the preceding sections of this chapter, we have considered the so-called 
finite population models, in which the error terms are assumed to be random 
variables from an infinite population; but the levels of the factors are assumed to 
be random samples from a finite population of levels, and use is made of the fact 
that the variance of the mean of a random sample of n from a finite population 
of size N with variance o? is given by (1 — nyo The extra factor (1 — 7) is 
known as the finite population correction. If we let f = n/N, the finite popula- 
tion correction is 1 — f. In this way, the tables of the expected values of the mean 
squares for various crossed and nested classifications were readily obtained. 

Tukey (1949a) emphasized the restrictiveness of these models and proposed 
to extend the range by defining more complex models. These models have 
received very little attention in statistical literature, except in some theoretical 
works. It is not possible to provide any further discussion on this topic here. 
Plackett (1960) presents an excellent review of many of these models. 


9.9 USE OF STATISTICAL COMPUTING PACKAGES 


The use of SAS, SPSS, and BMDP programs for analyzing finite population 
models is the same as described in earlier chapters for crossed, nested, and par- 
tially nested factors. The computations of degrees of freedom, sums of squares, 
and mean squares as obtained earlier also remain valid for the finite population 
models. However, the expected values of mean squares must be provided using 
the results outlined in this chapter. The results on tests of hypotheses, point 
estimates, and confidence intervals can then be obtained using procedures de- 
veloped in this chapter. 


EXERCISES 


1. Consider a two-way crossed finite population model involving three 
varieties of wheat and 3 different fertilizers. The three varieties of 
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wheat are selected randomly from a finite population of nine varieties 
of interest to the experimenter. Similarly, three fertilizers are taken at 
random from a finite population of twelve fertilizers available for the 
experiment. The data on yields in bushels/acre are given as follows. 


Fertilizer 
Variety 1 2 3 


I 60 52 65 
61 50 66 
62 58 68 
il 75 60 TI 
719 61 72 
77 62 73 
Hl 76 59 74 
77 60 75 
78 61 77 


(a) State the mathematical model and the assumptions for the ex- 
periment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are significant interaction effects among va- 
rieties and fertilizers. Use aw = 0.05. 

(d) Perform conservative and psuedo-F tests to determine whether 
there are differences in yields among the varieties of wheat. Use 
a = 0.05. 

(e) Perform conservative and psuedo-F tests to determine whether 
there are differences in yields among different fertilizers. Use 
a = 0.05. 

({) Estimate the variance components of the model. 

(g) Determine an exact 95 percent confidence interval for the error 
variance component. 

(h) Determine approximate 95 percent confidence intervals for other 
variance components of the model. 

(1) Determine an approximate 95 percent confidence interval for the 
general mean jz using the Satterthewaite procedure. 


Some Simple 
Experimental Designs 


10.0 PREVIEW 


In the previous chapters we developed techniques suitable for analyzing experi- 
mental data. It is important at this point to consider the manner in which the 
experimental data were collected as this greatly influences the choice of the 
proper technique for data analysis. If an experiment has been properly designed 
or planned, the data will have been collected in the most efficient manner for the 
problem being considered. Experimental design is the sequence of steps initially 
taken to ensure that the data will be obtained 1n such a way that analysis will lead 
immediately to valid statistical inferences. The purpose of statistically designing 
an experiment is to collect the maximum amount of useful information with a 
minimum expenditure of time and resources. It is important to remember that 
the design of the experiment should be as simple as possible consistent with 
the objectives and requirements of the problem. The purpose of this chapter 
is to introduce some basic principles of experimental design and discuss some 
commonly employed experimental designs of general applications. 


10.1 PRINCIPLES OF EXPERIMENTAL DESIGN 


Three basic principles in designing an experiment are: replication, random- 
ization, and control. The application of these principles ensures validity of the 
analysis and increases its sensitivity, and thus they are crucial to any scientific 
experiment. We briefly discuss each of these principles in the following. 


REPLICATION 


The first principle of a designed experiment is replication, which is merely a 
complete repetition of the basic experiment. It refers to running all the treatment 
combinations again, at a later time period, where each treatment 1s applied to 
several experimental units. It provides an estimate of the magnitude of the 
experimental error and also makes tests of significance of effects possible. 


RANDOMIZATION 


The second principle of a designed experiment is that of randomization, which 
helps to ensure against any unintentional bias in the experimental units and/or 
H. Sahai et al., The Analysis of Variance 483 


* 
) 
l 


GC) CaAeingarkoian ce: Rivcineace NAeaaae Naver VATE DONNA 
© Springer Science+Business Media New York 2000 


484 The Analysis of Variance 


treatment combinations and can form a sound basis for statistical inference. 
Here, an experimental unit is a unit to which a single treatment combination is 
applied in a single replication of the experiment. The term treatment or treat- 
ment combinations means the experimental conditions that are imposed on an 
experimental unit in a particular experiment. If the data are random, it is safe to 
assume that the experimental errors are independently distributed. However, er- 
rors associated with the experimental units that are adjacent in time or space will 
tend to be correlated, thus violating the assumption of independence. Random- 
ization helps to make this correlation as small as possible so that the analysis 
can be carried out as though the assumption of independence were true. Fur- 
thermore, it allows for unbiased estimates and valid tests of significance of the 
effects of treatments. In addition, although many extraneous variables affecting 
the response in a designed experiment do not vary ina completely random man- 
ner, it is reasonable to assume that their cumulative effect varies in a random 
manner. The randomization of treatments to experimental units has the effect of 
randomly assigning the error terms (associated with experimental units) to the 
treatments and thus satisfying the assumptions required for the validity of sta- 
tistical inference. The idea was originally introduced by Fisher (1926) and has 
been further elaborated by Greenberg (1951), Kempthorne! (1955, 1977), and 
Lorenzen (1984). There are a number of randomization methods available for 
assigning treatments to experimental units (see, e.g., Cochran and Cox (1957); 
Cox (1958a)). 


CONTROL 


The third principle of a designed experiment is that of control, which refers 
to the way in which experimental units in a particular design are balanced, 
blocked, and grouped. Balancing means the assignment of the treatment com- 
binations to the experimental units in such a way that a balanced or systematic 
configuration is obtained. Otherwise, it is unbalanced or we simply say that 
there are missing data. Blocking is the assignment of experimental units to 
blocks in such a manner that the units within a particular block are as ho- 
mogeneous as possible. Grouping refers to the placement of homogeneous 
experimental units into different groups to which separate treatments may be 
assigned. Balancing, blocking, and grouping can be achieved in various ways 
and at various stages of the experiment and their choice is indicated by the 
availability of the experimental conditions. The application of control results 
in the reduction of experimental error, which in turn leads to a more sensitive 
analysis. 

Detailed discussions of these and other principles involved in designing an 
experiment can be found in books on experimental design (see, e.g., Cochran 
and Cox (1957); Cox (1958a)). In the succeeding sections we discuss some sim- 
ple experimental designs for general application. Complex designs employed 


! Kempthorne (1977) stresses the necessity of randomization for the validity of error assumptions. 
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in many agricultural and biomedical experimentation are not considered here. 
The reader is referred to excellent books by Federer (1955), Cochran and Cox 
(1957), Gill (1978), Fleiss (1986), Hicks (1987), Winer et al. (1991), Hinkel- 
man and Kempthorne (1994), Kirk (1995), Steel et al. (1997), among others, 
for a discussion of designs not included here. Also, complete details of math- 
ematical models and statistical analysis are not given here since they readily 
follow from the same type of statistical models and the principle of partitioning 
of the sum of squares described in full detail in earlier chapters. 


Remark: There is a voluminous literature on experimental design and many excellent 
sources of reference are currently available. Herzberg and Cox (1959) have given bibli- 
ographies on experimental designs. Federer and Balaam (1972) provided an exclusive 
bibliography on designs (the arrangement of treatment in an experiment) and treatment 
designs (the selection of treatments employed in an experiment) for the period prior to 
1968. Federer and Federer (1973) presented a partial bibliography on statistical designs 
for the period 1968 through 1971. Federer (1980, 1981a,b) in a three-part article gave a 
bibliography on experimental designs from 1972 through 1978. For recent developments 
in design of experiements, covering the literature of 1975 through 1980, see Atkinson 
(1982). For an annotated bibliography of the books on design of experiments see Hahn 
(1982). 


10.2; COMPLETELY RANDOMIZED DESIGN 


In a completely randomized design, the treatments are allocated entirely by 
chance. In other words, all experimental units are considered the same and no 
division or grouping among them exists. The design is entirely flexible in that 
any number of treatments or replications may be used. The replications may 
vary from treatment to treatment and all available experimental material can 
be utilized. Among other advantages of this design include the simplicity of 
the statistical analysis even for the case of missing data. The relative loss of 
information due to missing data is less for the completely randomized design 
than for any other design. 

In a completely randomized design all the variability among the experimen- 
tal units goes into the experimental error. The completely randomized design 
should be used when the experimental material is homogeneous or missing val- 
ues are expected to occur. The design is also appropriate in small experiments 
when an increase in accuracy from other designs does not outweigh the loss 
of degrees of freedom due to experimental error. The main disadvantage to the 
completely randomized design is that it is often inefficient. 


MODEL AND ANALYSIS 


If we take n; replications for each treatment or treatment combination in a com- 
pletely random manner, then the analysis of variance model for the experiment 
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TABLE 10.1 
Analysis of Variance for the Completely Randomized Design with 
Equal Sample Sizes 


Expected Mean Square 
Source of Degreesof Sumsof Mean 


Variation Freedom Squares Square Model I Model II F Value 


a 
n t? 
Treatment a-—1 SS; MS, o7+—=)— o2+no2  MS,/MS¢ 
Error a(n — 1) SSE MSe o@? 
Total an — | SSr 
is given by 
i=1,2,...,a 
Vij = Ut + ei; {i= 1,2,....m, (10.2.1) 


where y;; 1s the j-th observation corresponding to the i-th treatment, —oo < 
jt < 00 1s the general mean, 7; is the effect due to i-th treatment, and e;; is the 
error associated with the i-th treatment and the j-th observation. As before, the 
assumptions inherent in the model are linearity, normality, additivity, indepen- 
dence, and homogeneity of variances. Clearly, model (10.2.1) 1s the same as 
the one-way classification model (2.1.1) and the appropriate analysis will be 
the one-way analysis of variance as described in Chapter 2. 

There are two models associated with the model equation (10.2.1). Model Tis 
concerned with only the treatments present in the experiment and under Model II 
treatments are assumed to be a random sample from an infinite population 
of treatments. Model I requires that ar n;t; = 0 and under Model II the 
T;’S are assumed to be normal random variables with mean zero and variance 
o”. The steps in the analysis of this model are identical to that discussed in 
Chapter 2. The complete analysis of variance for the balanced case (1.e., when 
ny = nz = +--+ = Ng =n) 18 shown in Table 10.] and that for the unbalanced 
case in Table 10.2. 

The hypothesis of interest under fixed effects or Model I is 


Ao: %] =™m=-:-=%=O0 
versus (10.2.2) 
H,: at least one 1; 4 0. 


In Model II, we are still interested in the hypothesis of no treatment effects; 
however, the 1T;’s are random variables with mean zero and variance o?. In this 
case the hypothesis of no treatment effects is 


Ho:o2 =0 versus M:02 > 0. (10.2.3) 
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TABLE 10.2 
Analysis of Variance for the Completely Randomized Design with 
Unequal Sample Sizes 


Expected Mean Square 
Source of Degreesof Sumsof Mean 


Variation Freedom Squares Square Model | Model II* F Value 


a 
yo nit; 

Treatment a-—1 SS; MS, o2 + a o2 + noe MS,/MSe 

a 
Error > nj —a SSE MSE oa? oa? 

i=] 

a 

Total yoni -1 SSr 


The statistic F = MS, /MSgz, whichhas an F distribution witha—1 anda(n—1) 
Oran n; — a, for the unbalanced case) degrees of freedom, is used to test the 
hypothesis (10.2.2) or (10.2.3). A more general hypothesis on a? may be of the 
form 


,. 272 ,. 2722 
Hj): 07/0; <p. versus Hj: 07/0; > po, 


where p, is a specified value of p, = o2/a2. As in (2.7.11), this hypothesis is 
tested by the statistic (1 +-np,)~!(MS,/MSz) which has an F distribution with 
a — 1 and a(n — 1) degrees of freedom. 

For the estimation of the variance components 0? and a7, which are of interest 
under Model II, we can, as before, employ the analysis of variance procedure. 
The estimators thus obtained are given by 


52 = MSz 


and ; (10.2.4) 
6° = —(MS, — MSz). 
n 


For all other details of the analysis of the model (10.2.1), refer to Chapter 2. 


WoRKED EXAMPLE 


Fisher (1958, p. 262) reported data on the weights of mangold roots collected by 
Mercer and Hall in a uniformity trial with 20 strips of land using a completely 
randomized design to test five different treatments each in quadruplicate. The 
data are given in Table 10.3. 
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TABLE 10.3 
Data on Weights of Mangold Roots in 
a Uniformity Trial 


Treatment 
A B C D E 
3376 3504 3430 3404 3253 
3361 3416 3334 3210 3314 
3366 3244 3291 3168 3287 


3330 3195 3029 3118 3085 


Source: Fisher (1958, p. 262). Used with permission. 


TABLE 10.4 

Analysis of Variance for the Data on Weights of Mangold Roots 
Source of —_ Degrees of Sums of Mean 

Variation Freedom Squares Square F value p-value 
Treatment 4 58,725.500 14,681.375 0.95 0.461 
Error 15 231,040.250 15,402.683 

Total 19 289,765.750 


The analysis of variance calculations are readily performed and the results are 
summarized in Table 10.4. The outputs illustrating the applications of statistical 
packages to perform the analysis of variance are presented in Figure 10.1. Here, 
the ratio of mean squares is not significant (p = 0.461) and the conclusion 
would be that there are no significant differences among the treatments. 


10.3 RANDOMIZED BLOCK DESIGN 


If the experimental units are divided into a number of groups and a complete 
replication of all treatments is allocated to each group, we have the so-called ran- 
domized complete block design. The randomized block design was developed 
by Fisher (1926). The randomization is carried out separately in each group of 
experimental units, which is usually designated as a block. Here, an attempt is 
made to contain the major variations between blocks so that the experimental 
error in each group is relatively small. Thus, the blocks may be constructed 
so as to coincide with the degree of variability in experimental material. For 
example, in agricultural experimentation, each observation of, say, yield, comes 
from a plot of land, and we may group adjacent plots that are relatively homo- 
geneous to form a block. In executing the experiment, we randomly allocate 
the treatments to the plots in the first block and then repeat the randomization 
for the second and other remaining blocks. 
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DATA MANGOLD; The SAS System 

INPUT TRTMENT $ WEIGHT; Analysis of Variance Procedure 

DATALINES; 

A 3376 Dependent Variable: WEIGHT 

fA 3361 Sum of Mean 

Source DF Squares Square F Value Pr> F 


E 3085 
; Model 4 58725.5000 14681.3750 0.95 0.4610 
PROC ANOVA; Error 15 231040. 2500 15402. 6833 
} CLASSES TRTMENT; Corrected 19 289765.7500 
] MODEL WEIGHT=TRTMENT; Total 
RUN; R-Square C.V. Root MSE WEIGHT Mean 
CLASS LEVELS VALUES 0.202665 3.7771452 124.10755 3285.7500000 
TRTMENT 5 ABCDE 
NUMBER OF OBS. IN DATA — Source DF Anova SS Mean Square F Value Pr > F 
} SET=20 TRTMENT 4 58725.50 14681.37 0.95 0.4610 


(i) SAS application: SAS ANOVA instructions and output for the completely random- 
ized design. 


DATA LIST Test of Homogeneity of Variances 
/TRTMENT 1 

WEIGHT 3-6 Levene 

BEGIN DATA. Statistic dfl af2 Sig. 
a 1 3376. WEIGHT 1.940 4 15 .156 
11 3361. 

1 3366. 

1 3330. ANOVA 


5 3085. Sum of Squares df Mean Square F Sig. 
END DATA. 

| ONEWAY WEIGHT BY WEIGHT Between Groups 58725.500 4 14681.375 -953 .461 
TRTIMENT (1,5) Within Groups 231040.250 15 15402.683 
/STATISTICS=ALL. Total 289765.750 19 


(11) SPSS application: SPSS ONEWAY instructions and output for the completely ran- 
domized design. 


FILE='C: \SAHAI BMDP7D - ONE- AND TWO-WAY ANALYSIS OF VARIANCE WITH 
\TEXTO\EJE23.TXT'. DATA SCREENING Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 

VARIABLES=2. JANALYSIS OF VARIANCE TABLE FOR MEANS 
/VARIABLE NAMES=TRT,WEIGHT. | |SOURCE SUM OF SQUARES DF MEAN SQUARE F VALUE PROB. | 
/GROUP CODES (TRT)=1, 2,3, 
4,5. | TRTMENT 58725.5000 4 14681.3750 0.95 
NAMES (TRT)=A,B,C, | |ERROR 231040.2500 15 15402.6833 
D,E. 

| /HISTOGRA GROUPING=TRT. EQUALITY OF MEANS TESTS; 
VARIABLES=WEIGHT. VARIANCES ARE NOT ASSUMED TO BE EQUAL 

| /END | WELCH 

1 3376 | BROWN- FORSYTHE 
1 3361 . 


5 3085 


(iii) BMDP application: BMDP 7D instructions and output for the completely random- 
ized design. 


FIGURE 10.1 Program Instructions and Output for the Completely Randomized 
Design: Data on Weights of Mangold Roots in a Uniformity Trial (Table 10.3). 
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To illustrate the layout of a randomized block design, let us consider eight 
treatments, say, 7), 72, ..., Tg, corresponding to eight levels of a factor to be 
included in each of five blocks. Figure 10.2 shows such an experimental layout. 
Note that the treatments are randomly allocated within each block. It is evident 
that this layout is quite different from the completely randomized experiment 
where there will be a single randomization of eight treatments repeated five 
times to 40 plots. 


FIGURE 10.2 A Layout of a Randomized Block Design. 


MODEL AND ANALYSIS 


The analysis of variance model for a randomized complete block design with 
one observation per experimental unit is given by 


i=1,2,...,b 
Vij =UMA+B +7 + ei; ' 12... (10.3.1) 


where y;; denotes the observed value corresponding to the i-th block and the 
j-th treatment; —oo < yt < oo is the general mean, f; is the effect of the 
i-th block, t; is the effect of the j-th treatment, and e;; 1s the customary error 
term. Clearly, the model (10.3.1) is the same as the model equation (3.1.1) 
for the two-way crossed classification with one observation per cell. Thus, 
the analysis is identical to that discussed in Chapter 3 with the only differ- 
ence that the factor A now designates “blocks” and the factor B denotes 
“treatments.” There are three versions of model (10.3.1) (i.e., Models I, II, and 
III) depending on whether blocks or treatments or both are chosen at random. 


Both Blocks and Treatments Fixed 
In a randomized block experiment, both blocks and treatments may be fixed. 
In this case, the B;’s and t;’s are fixed constants with the restrictions that 


b 
yA =0- 
i=l 


and the e;;’s are normal random variables with mean zero and variance o2. The 
analysis of variance in Table 3.2 can now be rewritten in the notation of model 
(10.3.1) as shown in Table 10.5. 


t 
Tj, 
j=l 
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Here, the hypothesis 


Ajp:%y =m=::- = =0 
versus (10.3.2) 
H;:t; # 0 for at least one j, j = 1,2,...,1 


is of primary interest and is tested by the statistic 


_ MS, 


fF, = ; 
MS=e 


If F, > F(t —1,(b— 1) — 1); 1 —a@], then Hp will be rejected and the con- 
clusion is that there are significant differences among the treatments. 
The hypothesis 


Hy: Bi =p, =--- =p, =0 
versus (10.3.3) 
H??: B; # 0 for at least one i,i = 1,2,...,b, 


although of minor importance, may be tested in a similar manner by the statistic 


_ MS, 
~~ MS," 


Fp 


However, due to the manner in which the experiment 1s set up, the hypothesis 
(10.3.3) should not be tested except as acheck on the blocking of the experiment. 
The whole purpose of a randomized block design is to reduce experimental 
error and get a more efficient test of (10.3.2). Therefore, if the statistic Fg is 
nonsignificant, there is strong evidence of improperly carried out blocking. In 
that case, the entire experiment should be repeated with more careful attention 
to the assignment of the treatments to the experimental units.” 


Both Blocks and Treatments Random 

In arandomized block experiment, both blocks and treatments may be randomly 
chosen, and then we will have a Model II or arandom model. Here, all B;’s, t;’s, 
and e;;’s are mutually and completely independent normal random variables 
with mean zero and variances Op, o7, and o?, respectively. The analysis of 


variance in this case is also given by Table 10.5. The hypotheses on oR and 


o? can be tested by the same statistics as in the case when both blocks and 


treatments are fixed. 
Blocks Random and Treatments Fixed 
In a randomized block experiment, the blocks may be chosen randomly from a 


population of blocks, but the treatments may be fixed. In this case, we will have 


2 For further discussions of this issue, see Lentner et al. (1989) and Samuels et al. (1991). 
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a Model III or mixed model, where t;’s are fixed constants with the restriction 
that 


t 
) Tj = 0; 
j=l 


and the £;’s and e;;’s are mutually and completely independent normal random 
variables with mean zero and variance oR and a2, respectively. The analysis of 
_variance in this case is again given by Table 10.5. The hypotheses about 8;’s 
and t;’s can similarly be tested as in the case when both blocks and treatments 
are fixed.. 


Blocks Fixed and Treatments Random 

In a randomized block experiment the treatments may be chosen at random 
from a population of treatments, but the blocks may be fixed. In this case, we 
again have a mixed model situation. The assumptions and tests of hypotheses 
are as given in the preceding case with the roles of the 6;’s and T;’s being 
reversed. 


Remark: It should be noticed that if block effects had been ignored in the analysis, the 
analysis of variance would be the same as shown in Table 10.5, except that now the block 
and residual sum of squares would be pooled giving an error sum of squares equal to 
SSz + SS_ with b(t — 1) degrees of freedom. Thus, the test for the hypothesis (10.3.2) 
would be inefficient since all the variation between blocks has been lumped with the 
experimental error. Furthermore, note that the analysis of variance model (10.3.1) for a 
randomized complete block design looks identical to the two-way crossed classification 
model (3.2.1) with one observation per cell. However, the assignment of experimental 
units to treatments in these two layouts is quite different. In a randomized block design, 
the ¢ treatments are randomized within a block whereas in a two-way crossed model, 
a x b treatment combinations are completely randomized to a x b experimental units. 
Thus, the interpretation of the two models is quite different. The randomized block 
design of course can be extended to problems involving two or more factors. 


MISSING OBSERVATIONS 

The problem of missing data is treated similarly to that discussed in Section 3.10 
for the two-way classification with one observation per cell. 

RELATIVE EFFICIENCY OF THE DESIGN 

The relative efficiency (RE) of an experimental design in comparison to any 


other design can be evaluated in terms of the variance of the treatment.? In a 


3 In general, the relative efficiency is defined as the ratio of two variances. Thus, given two 
estimators T; and 7> of the same parameter, the relative efficiency of 7; compared to 7) is defined 
as Var(T2)/Var(7). The preceding derivation depends essentially upon this type of comparison 
of variances. For another approach to relative efficiency of a design, see Cochran (1937). 
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TABLE 10.6 
Data on Yields of Wheat Straw from 
a Randomized Block Experiment 


Treatment 
Block 1 2 3 4 
1 332 412 542 730 
2 260 384 472 590 
3 202 362 516 294 
4 210 348 458 560 


Source: Anderson (1946). Used with permission. 


randomized block design (RBD) with b blocks and ¢ treatments, let MSg and 
MSz; be the block and error mean squares, respectively. If a completely ran- 
domized design (CRD) were used with the same number b x t of experimental 
units as the RBD, then an estimate of the error variance would be obtained as 


(b — 1)MSz3 + b(t — 1)MS_E 
bt — 1 


However, the error mean square of the RBD with the same number of experi- 
mental units is actually MS_-. Hence, the RE of the RBD compared to CRD is 
given by 


_ (= 1)MSz + b@ — 1I)MSz 


RE 
(bt — 1)MS¢ 


REPLICATIONS 


In using a randomized block design (RBD), it is sometimes desirable to repli- 
cate each block-treatment combination on 7 experimental units. Such a design 
is commonly known as generalized randomized block design (GRBD). The 
principal advantage of the GRBD over the RBD lies in the fact that it allows the 
estimation of interaction effects between blocks and treatments. The analysis 
of this design proceeds in exactly the same manner as for the two-way crossed 
classification with interactions discussed in Chapter 4 with the only difference 
that the factor A now designates ‘blocks’ and the factor B denotes “treatments.” 


WoRKED EXAMPLE 


Anderson (1946) reported data on the yields of wheat straw from an experiment 
using arandomized block design with four blocks and four treatments. A portion 
of the data are given in Table 10.6. 
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TABLE 10.7 

Analysis of Variance for the Data on Yields of Wheat 
Straw 

Source of Degreesof Sums of Mean 

Variation Freedom § Squares Square Fvalue p-value 
Block 3 4,362  18,120.667 2.60 0.117 
Treatment 3 206,394  68,798.000 9.88 0.003 
Error 9 62,700 6,966.667 

Total 15 323,456 


The analysis of variance calculations are readily performed and the results are 
summarized in Table 10.7. The outputs illustrating the applications of statistical 
packages to perform the analysis of variance are presented in Figure 10.3. Here, 
the ratio of mean squares for treatments is highly significant (p = 0.003) and 
there is very strong evidence of real treatment differences. The block effects 
seem to be insignificant (p = 0.117) and there may be some question regarding 
the effectiveness of the blocking. 


10.4 LATIN SQUARE DESIGN 


The randomized block design was used to reduce experimental error by elimi- 
nating a source of variation in experimental units by utilizing the principle 
of blocking. The Latin square design eliminates two extraneous sources of 
variation in experimental units by using two-way or double blocking on the 
experimental units. The rows and columns are then used for two mutually or- 
thogonal systems of blocks and the letters are used for treatments. In agricultural 
experiments, the rows and columns are usually strips of land, with row strips at 
right angles to the column strips, and the plots are the intersection of strips in 
different directions. In this sense we can say that the Latin square is an extension 
of the randomized block design. 

In general, a Latin square for p treatments, or a p x p Latin square, 1s 
a square matrix with p rows and p columns. Each of the resulting p* cells 
contains one of the p letters. Each letter corresponds to one of the treatments 
and each letter occurs once and only once in each row and each column. A 
Latin square of any order can be obtained most easily by simply writing the 
letters in their natural order in the first column and then completing each row by 
other letters cyclically, that 1s, with symbols again in the same order except that 
the last letter is followed by the first. Some of the examples of Latin squares 
are given in Figure 10.4. Appendix X contains some more representations of 
Latin squares from 3 x 3 to 12 x 12. Some more examples are given in Norton 
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DATA WHEATSTRAW; The SAS System 
INPUT BLOCK TRTMENT YIELD; Analysis of Variance Procedure 
DATALINES; 
1 332 Dependent Variable: YIELD 
2 412 Sum of Mean 
3 542 Source DF Squares Square ¥F Value Pr>F 


4 560 Model 6 260756.0000 43459.3333 6.24 0.0079 

i Error 9 62700.0000 6966. 6666 

PROC ANOVA; Corrected 15 323456.0000 

CLASSES BLOCK TRTMENT; Total 

MODEL YIELD=BLOCK TRTMENT; R~Square Cc.V. Root MSE YIELD Mean 
RUN; 0.806156 20.015962 83. 466560 417.0000 
CLASS LEVELS VALUES 
BLOCK 4 1234 
TRIMENT 4 1234 
NUMBER OF OBS. IN DATA BLOCK 3 54362.0000 18120.6667 2.60 0.1165 
1} SET=16 TRIMENT 3 206394.0000 68798 .0000 88 


Source DF Anova SS Mean Square F Value Pr>F 


design. 


DATA LIST Analysis of Variance-~-Design 1 
/BLOCK 1 TRIMENT 3 
YIELD 5-7. Tests of Significance for YIELD using UNIQUE sums of squares 


Source of Variation ss DF MS F Sig of 
RESIDUAL 62700. 


9 : 
BLOCK 54362. 3 . . 117 
TRIMENT 206394. 3 003 


J MANOVA YIELD BY (Model) 260756. . . .008 
BLOCK (1, 4) (Total) 323456. 
TRTMENT (1, 4) 
/DESIGN=BLOCK R-Squared = .806 
TRIMENT. Adjusted R-Squared = .677 


design. 


/INPUT FILE='C: \SAHAI BMDP2V - ANALYSIS OF VARIANCE AND COVARIANCE WITH 
\TEXTO\EJE24.TXT'. REPEATED MEASURES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 

VARIABLES=3. ANALYSIS OF VARIANCE FOR THE 1-ST DEPENDENT VARIABLE 

/VARIABLE NAMES=BL,TRE, YIELD. |THE TRIALS ARE REPRESENTED BY THE VARIABLES: YIELD 
VARIABLE=BL, TRE. 

CODES (BL)=1,2,3,4. THE HIGHEST ORDER INTERACTION IN EACH TABLE HAS BEEN 
NAMES (BL) =B1, B2,B3, REMOVED FROM THE MODEL SINCE THERE IS ONE SUBJECT PER 
B4. CELL 
CODES (TRE) =1,2,3,4 
NAMES (TRE) =T1,T2,T3, | SOURCE SUM OF D.F MEAN TAIL 
| T4. SQUARES SQUARE PROB. 

1 /DESIGN DEPENDENT=YIELD. 

i /END MEAN 2782224.00000 1 2782224.00000 399.36 0.0000 
1 1 332 BLOCK 54362.00000 3 18120.66667 2.60 0.1165 
Le TREATM 206394.00000 3 68798.00000 9.88 0.0033 

‘ 4 560 ERROR 62700.00000 9 6966. 66667 


(iii) BMDP application: BMDP 2V instructions and output for the randomized block 
design. 


FIGURE 10.3 Program Instructions and Output for the Randomized Block 
Design: Data on Yields of Wheat Straw from a Randomized Block Design 
(Table 10.6). 
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(1939), Cochran and Cox (1957, pp. 145-146), and Fisher and Yates (1963, 
pp. 86-89). 


3 x3 4x4 5x5 6x6 
ABC ABCD ABCDE ABCDEF 
BCA BCDA BCDEA BCDEFA 
CAB CDAB CDEAB CDEFAB 
DABC DEABC DEFABC 


EABCD EFABCD 
FABCDE 


FIGURE 10.4 Some Selected Latin Squares. 


For a given size p, there are many different p x p Latin squares that can 
be constructed. For example, there are 576 different possible 4 x 4 Latin 
squares, 161,280 different 5 x 5 squares, 812,851,200 different 6 x 6 squares, 
61,428,210,278,400 different 7 x 7 squares, and the number of possible squares 
increases vastly as the size of p increases. The smallest Latin square that can 
be used is a 3 x 3 design. Latin squares larger than 9 x 9 are rarely used due 
to the difficulty of finding equal numbers of groups for the rows, columns, 
and treatments. The randomization procedures for Latin squares were initially 
given by Yates (1937a) and are also described by Fisher and Yates (1963). The 
proper randomization scheme consists of selecting at random one of the ap- 
propriate size Latin squares from those available. Randomization can also be 
carried out by randomly permuting first the rows and then the columns, and 
finally randomly assigning the treatments to the letters. 

Latin squares were first employed in agricultural experiments where soil con- 
ditions often vary row-wise as well as column-wise. Treatments were applied 
in a field using a Latin square design in order to randomize for any differences 
in fertility in different directions of the field. However, the design was soon 
found to be useful in many other scientific and industrial experiments. Latin 
squares are often used to study the effects of three factors, where the factors 
corresponding to the rows and columns are of interest in themselves and not 
introduced for the main purpose of reducing experimental error. Note that in a 
Latin square there are only p” experimental units to be used in the experiment 
instead of the p* possible experimental units needed in a complete three-way 
layout. Thus, the use of the Latin square design results in the savings in ob- 
servations by a factor of 1/p observations over the complete three-way layout. 
However, this reduction is gained at the cost of the assumption of additivity 
or the absence of interactions among the factors. Thus, in a Latin square, it is 
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very difficult (often impossible) to detect interaction between factors. To study 
interactions, other layouts such as factorial designs are needed. 


MODEL AND ANALYSIS 


The analysis of variance model for a Latin square design 1s 
i=1,2,.. 
Vijk = UM+Q; + Bj +h] + eijx J=1,2,...,p (10.4.1) 
k=1,2 


where y;;x denotes the observed value corresponding to the i-th row, the j-th 
column, and the k-th treatment; —co < pz < o© Is the overall mean, «a; 1s the 
effect of the i-th row, 8; is the effect of the j-th column, t; is the effect of the k-th 
treatment, and e;;, is the random error. The model is completely additive; that is, 
there are no interactions between rows, columns, and treatments. Furthermore, 
since there is only one observation in each cell, only two of the three subscripts 
i, J, and k are needed to denote a particular observation. This is a consequence 
of each treatment appearing exactly once in each row and column. 

The analysis of variance consists of partitioning the total sum of squares of 
the N = p’ observations into components of rows, columns, treatments, and 
error by using the identity 


Yijk — ¥.. = W.. — VAG. — XD + Ok — Yz.) 
+ (vijk — Vi. — Yj. — Vik + 2Y_,). 


Squaring each side and summing over /, j, k, and noting that (7, j, k) take on 
only p” values, we obtain 


SSr = SSre + SSc + SS, + SSzE, (10.4.2) 


where 


t=] j=l k=1 
Pp 
SSe = p> Gi. - 5.) 
i=1 
Pp 
SSc =p) (93.-9..)" 
j=1 
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TABLE 10.8 
Analysis of Variance for the Latin Square Design 


Expected Mean Square* 


Source of Degreesof Sumsof Mean =— ——————————— 
Variation Freedom Squares Square Model | Model Il —-F Value 
p & 
Row p-1 SSr_ MSr oo + —— Day 02+ po2 MSr/MS¢ 
P~* j=l 
Column p-1 SSc MSc o2+ — >- 83 02+ p02 MSc/MSz 
p—- : 
j=1 
P 
Treatment p-1 SS, MS, o? + — > t o? + pt MS,/MSe 
P~ * k=l 
Error (p—1)\(p—2)  SSe MSe o2 
Total p?—-1 SSr 


* The expected mean squares for the mixed model are not shown, but they can be obtained by 
replacing the appropriate term by the corresponding term as one changes from fixed to random 
effect; for example, replacing )-?_, a?/(p — 1) by of. 


=! 


and 
Pp ?P 
SS_ = » 2 > Oni — i. — V7. — Pa. +29_). 


The corresponding degrees of freedom are partitioned as 


Total Rows Columns Treatments Error 
p?-1=(p—1)+ (p—-1) + (p-1) +(p—-1)(p-2) 


The usual assumptions of the fixed effects model are: 


and the e;;,’s are normal random variables with mean zero and variance o?. 


Under the assumptions of the random effects model, the a;’s, B;’s, and T;’s are 
also normal random variables with mean zero and variances o?, Op and o?, 
respectively. Other assumptions leading to a mixed model can also be made. 
Now, the expected mean squares are readily derived and the complete analysis 
of variance is shown in Table 10.8. Furthermore, it can be shown that under 
the fixed effects model, each sum of squares on the right-hand side of (10.4.2) 
divided by co? is an independently distributed chi-square random variable. 
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Under Model I, the appropriate statistic for testing the hypothesis of no 
treatments effects, that is, 


Aj: =m=-:-=t) =0 
versus 
H;:%] # 0 for at least onek,k = 1,2,..., p, 


1S 
F, = MS,/MSz, 


which is distributed as F[p — 1, (p — 1)(p — 2)] under the null hypothesis and 
as F’[p — 1, (p — 1)(p — 2); A] under the alternative, where 


_ Pp 2 
~ 262(p — 1) 
e k=1 

The only hypothesis generally of interest in a Latin square design is the one con- 
cerning the equality of treatments under Model I as given previously. However, 
one may also test for no row effects and no column effects by forming the ra- 
tio MSr/MSe or MSc /MS_. However, since the rows and columns represent 
restrictions on randomization, these tests may not be appropriate. If rows and 
columns represent factors and any real interactions are present, they will inflate 
the MS; and will make the tests less sensitive. If p < 4, the design 1s consid- 
ered to be inadequate for providing sufficient degrees of freedom for estimating 
experimental error. 


POINT AND INTERVAL ESTIMATION 


Estimates of various parameters of interest in a Latin square design are readily 
obtained along with their sample variances. For example, under Model I, we 
have 


p=. Var(§...) = 07 / ps 
a _ _ 2 
A+ = Yj.., Var(yi..) = 0% / D; 
Q@; = Vi. — Y.., Var(i.. — 9...) = (p — 1)02 / p?; 
a 
a; — Oy = Yi. — Vir, Var(¥i.. — yir..) = 202 / p; 
Po p p p p 
> fia, = Do Yi... var( 3 ai.) = (= 2) of |p (= {; = 0); 
i=] i=l] i=] i=] i=] 
3? = MSz, Var(MSz) = 203 /(p — 1)(p — 2). 


A 100(1 — a) percent confidence interval for o2 is given by 


(p — 1)(p — 2)MSe 2 (p — 1)(p — 2)MSe 
xp —1(p—2),1-a/2]  * — x*M(p — 1)(p — 2), @/2] 
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Confidence intervals for fixed effects parameters considered previously can be 
constructed from the results of their sampling variances. 


POWER OF THE F TEST 


One can calculate the power of the test in the same manner as discussed in 
earlier chapters. For example, the noncentrality parameter ¢ with respect to the 
hypothesis H, is given by 


Here, v; = p — 1 and vy = (p — 1)(p — 2). Except for this modification, the 
power calculations remain unchanged. 


MULTIPLE COMPARISONS 


For Models I and III, Tukey, Scheffé, and other procedures described in 
Section 2.19 may be readily adapted for use with the Latin square design. 
For example, consider a contrast of the form 


P 


L= £30; (yo -0), 
1 i=] 


i= 


which 1s estimated by 


Then, using the Tukey’s method, ZL is significantly different from zero with 
confidence coefficient 1 — a if 


~ 


L 
5 > q[p,(p — 1)(p — 2);1—a]. 


] 
J p-'MS¢ (; lei 


i=] 


If the Scheffé’s method is applied to these comparisons, then L is significantly 
different from zero with confidence coefficient 1 — @ if 


L 


Pp 
(p — 1)MSz (> ; / ) 


> {F[p —1,(p— 1)(p — 2);1 — a}. 


i=l 


Similar modifications are made for other contrasts and procedures. 
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COMPUTATIONAL FORMULAE 


The computation of sums of squares can be performed easily by using the 
following computational formulae: 


SSp = y - =, 

p — Le. p? 

l Pp y? 
SSo=—) y, -S, 

p d, vp? 

| Pp y? 
SS,=-—) y,-, 

T Pp > wk p? 

Pp Pp Pp 5 y? 

SSr = DD Yin — 


~ 
II 
— 
ne 
II 
— 
oa 
II 
— 


and 
SSe = SSr — SSr — SSc — SS;. 


MISSING OBSERVATIONS 


When a single observation y;;; is missing, its value is estimated by 


PCy; + yj. + y') —2y!, 
(p — 1)(p — 2) 


where the primes indicate the previously defined totals with one observation 
missing. After substituting the estimate (10.4.3) for the missing value, the sums 
of squares are calculated in the usual way. To correct the treatment mean squares 
for possible bias, the quantity 


wa 


Yijk = ’ (10.4.3) 


[y) — yy. -y¥, —-@- Dy? 
(p — 13 (p — 2)? ) 


is subtracted from the treatment mean square. The variance of the mean of the 
treatment with a missing value 1s 


Var(j )= | +§ 
ee pal! pe-De-DI 


and the variance of the difference between two treatment means (involving one 
with the missing value) is 


Baia) 
p (p-1j(p-2)] * 
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which is slightly larger than the usual expression 207/p for the case of no 
missing value. 

For several missing values, more complicated methods are generally required. 
Formulae giving explicit expressions for several missing values can be found 
in Kramer and Glass (1960). However, for a few missing values, an iterative 
scheme may be used. The procedure is to make repeated use of the formula 
(10.4.3). When all missing values have been estimated, the analysis of variance 
is performed in the usual way with the degrees of freedom equal to the number 
of missing values subtracted from the total and error. Detailed discussions on 
handling cases with two or more missing values can be found in Steele and 
Torrie (1980, pp. 227—228) and Hinkelman and Kempthorne (1994, Chapter 10). 
The analysis of the design when a single row, column, or treatment is missing 
is given by Yates (1936b). The methods of analysis when more than one row, 
column, or treatment is missing are described by Yates and Hale (1939) and 
DeLury (1946). 


TESTS FOR INTERACTION 


Tukey (1955) and Abraham (1960) have generalized Tukey’s one degree of 
freedom test for nonadditivity to Latin squares. Snedecor and Cochran (1989, 
pp. 291-294) and Neter et al. (1990, pp. 1096-1098) provide some additional 
details and numerical examples. For some further discussion of the topic, see 
Milliken and Graybill (1972). Effects of nonadditivity in Latin squares have 
been discussed by Wilk and Kempthorne (1957) and Cox (1958b). 


RELATIVE EFFICIENCY OF THE DESIGN 


Suppose instead of a Latin square design (LSD), a randomized block design 
(RBD) with p rows as blocks is used. An estimate of the error variance would 
then be given by 


(p — 1)MSc + (p — 1)°MSe 
p(p — 1) | 


The preceding formula comes from the fact that the column mean square would 
be pooled with the error mean square as there are no columns in the RBD. 
However, the LSD under the same experimental conditions actually has the 
error mean square MSz. Hence, the relative efficiency (RE) of LSD relative to 
RBD with rows as blocks (called column efficiency) is given by 


(p — 1)MSc +(p — 1)*MS¢z 
p(p — 1)MSz 
MSc + (p — 1)MS_- 
~ pMSe 


RE gotumn = 
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Similarly, if the columns are treated as blocks, then the RE of LSD relative to 
RBD (called row efficiency) is given by 


MSr + (p — 1I)MSeE 


REyow = pM S 
E 


REPLICATIONS 


In using a small size Latin square, it is often desirable to replicate it. The usual 
model for a Latin square with r replications is 


1=1,2,...,p 
j =1,2,...,p 
Yijke = M+ +B +h + Pet Cijee Yeo. p 
| an | 2,225 T, 


where y;;x¢ denotes the observed value corresponding to the i-th row, the j-th 
column, the k-th treatment, and the @-th replication; —oo < yz < oo Is the 
overall mean, a; is the effect of the i-th row, 6; is the effect of the j-th column, 
t, is the effect of the k-th treatment, pz is the effect of the r-th replication, 
and é;j;x¢ is the customary error term. When a Latin square is replicated, it 1s 
important to know whether it is replicated using the same blocking variables 
or there are additional versions of one or both blocking variables. The analysis 
of variance for the general case in which a Latin square is replicated r times 
using the same blocking variables proceeds in the same manner as before. 
However, now, an additional source of variation due to replicates is introduced. 
The degrees of freedom for the rows, columns, and treatments are the same, 1.e., 
p — 1, but the degrees of freedom for the total, replicates, and error are given by 
rt? —1,r—1, and (p — 1)[r(p + 1) — 3] respectively. When a Latin square is 
replicated with additional versions of the row (column) blocking variable, the 
analysis remains the same except that now the degrees of freedom for the rows 
(columns) and the error are r(p — 1) and (p — 1)(rp — 2) respectively. When 
a Latin square is replicated with additional versions of both row and column 
blocking variables, the degrees of freedom for the rows, columns, and error are 
now given by r(p — 1), r(p — 1), and (p — 1)[r(p — 1) — 1] respectively. 


Remark: Latin squares were proposed as experimental designs by R. A. Fisher (1925, 
1926) and in 1924 he made some early applications of Latin squares in the design of 
an experiment in a forest nursery. A Latin square experiment for testing the differences 
among four treatments for warp breakage, where time periods and looms were used as 
rows and columns, has been described by Tippett (1931). Davies (1954) describes one of 
the earliest industrial applications of Latin squares related to wear-testing experiments 
of four materials where the runs and positions of a machine were represented as rows and 
columns. For a survey of Latin square designs in agricultural experiments, see Street and 


TABLE 10.9 


Data on Responses of Monkeys to Different 
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Stimulus Conditions 


Monkey 


a &wWN = 


1 


194 (B) 
202 (D) 
335 (C) 
515 (E) 
184 (A) 


2 


369 (D) 
142 (B) 
301 (A) 
590 (C) 
421 (E) 


Week 
3 


344 (C) 
200 (A) 
493 (E) 
552 (B) 
355 (D) 


4 


380 (A) 
356 (E) 
338 (B) 
677 (D) 
284 (C) 


5 


693 (E) 
473 (C) 
528 (D) 
546 (A) 
366 (B) 
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Source: Snedecor (1955). Used with permission. 


TABLE 10.10 
Analysis of Variance for the Data on Responses of Monkeys 
to Different Stimulus Conditions 


Source of Degrees of Sums of Mean 

Variation Freedom Squares Square Fvalue p-value 
Monkey 4 262,961.040  65,740.260 18.51 <0.001 
Week 4 144,515.440 36,128.860 10.17 <0.001 
Stimulus 4 111,771.440 = 27,942.860 7.87 0.002 
Error 12 42,628.320 3,552.360 

Total 24 561,876.240 


Street (1988). The principal reference book on Latin squares is by Dénes and Keedwell 
(1974). For a discussion of combinatorial problems in Latin squares, see Street and 
Street (1987). 


WORKED EXAMPLE 


Snedecor (1955) reported data from an experiment conducted to study responses 
of pairs of monkeys to a certain kind of stimulus under a variety of conditions. 
The responses were measured on five pairs of monkeys during five successive 
weeks under five different conditions using a Latin square design. The data are 
given in Table 10.9 where the letter within parentheses represents the stimulus 
condition used. 

The analysis of variance calculations are readily performed and the results are 
summarized in Table 10.10. The outputs illustrating the applications of statis- 
tical packages to perform the analysis of variance are presented in Figure 10.5. 


506 The Analysis of Variance 


DATA MONKEYS; The SAS System 
INPUT MONKEY WEEK Analysis of Variance Procedure 
STIMULUS $ RESPONSE; Dependent Variable: RESPONSE 
DATALINES; 
11 8B 194 Sum of Mean 
. Source Squares Square F Value Pr > F 


15 5 B 366 


; Model 519247.92000 43270.66000 12.18 0.0001 
PROC ANOVA; Error 42628.32000 3552.36000 

CLASSES MONKEY WEEK 

STIMULUS; Corrected 24 561876.24000 

MODEL RES PONSE=MONKEY Total 

WEEK STIMULUS; R-Square c.V. Root MSE RESPONSE Mean 
RUN; 0.924132 15.145781 59.601678 393.52000000 

CLASS LEVELS VALUE 
MONKEY 5 23 

} WEEK 5 23 

STIMULUS 5 BC MONKEY 262961.0400 65740.2600 18.51 0.0001 
NUMBER OF OBS. IN DATA WEEK 144515.4400 36128.8600 10.17 0.0008 
SET=25 STIMULUS 111771.4400 27942.8600 7.87 0.0024 


S 
4 Source DF Anova SS Mean Square F Value Pr > F 
4 
D 


(i) SAS application: SAS ANOVA instructions and output for the Latin square design. 


DATA LIST Analysis of Variance--Design 1 
/MONKEY 1 WEEK 3 

STIMULUS 5 Tests of Significance for RESPONSE using UNIQUE sums of squares 
RESPONSE 7-9. 
BEGIN DATA. Source of Variation Ss MS F 
11 2 194 

11 4 202 RESIDUAL 42628. . 

woe ef MONKEY 262961. . 18.51 
5 5 2 366 WEEK 144515. . 10.17 
END DATA. STIMULUS 111771. . 7.87 
MANOVA RESPONSE BY 

MONKEY (1, 5) (Model) 519247. . 12.18 
WEEK (1,5) (Total) 561876. 

STIMULUS (1,5) 

/DESIGN=MONKEY R-Squared = ~924 

WEEK STIMULUS. Adjusted R~Squared = - .848 


(ii) SPSS application: SPSS MANOVA instructions and output for the Latin square 
design. 


/INPUT FILE='C: \SAHAI BMDP2V - ANALYSIS OF VARIANCE AND COVARIANCE WITH 
\TEXTO\EJE25.TXT'. 
FORMAT=FREE. REPEATED MEASURES Release: 7.0 (BMDP/DYNAMIC) 
VARIABLES=4. 
| /VARIABLE NAMES=M,W,S, RESP. 
/GROUP VARIABLE=M,W,S. ANALYSIS OF VARIANCE FOR THE 1-ST DEPENDENT VARIABLE 
CODES (M)=1,2,3,4,5. 
NAMES (M) =M1,...,M5. 
CODES (W)=1,2,3,4,5. | THE TRIALS ARE REPRESENTED BY THE VARIABLES: RESPONSE 
NAMES (W) =W1,...,W5. 
CODES (S)=1,2,3,4,5. SOURCE SUM OF D.F. MEAN F TAIL 
NAMES (S)=A,B,C,D,E. SQUARES SQUARE PROB. 
/DESIGN  DEPENDENT=RESP. 
INCLUDE=1,2,3. MEAN 3871449.76000 1 3871449.76000 1089.82 0.0000 
/END MONKEY 262961.04000 4 65740.26000 18.51 0.0000 
112 194 WEEK 144515.44000 4 36128.86000 10.17 0.0008 
~ 8 8 STIMULUS 111771.44000 4 27942.86000 7.87 0.0024 
5 5 2 366 ERROR 42628.32000 12 3552.36000 


(iii) BMDP application: BMDP 2V instructions and output for the Latin square design. 


FIGURE 10.5 Program Instructions and Output for the Latin Square Design: 
Data on Responses of Monkeys to Different Stimulus Conditions (Table 10.9). 
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Here, there is a very significant effect due to stimulus conditions. The effects 
due to monkeys and weeks are also highly significant. The use of the Latin 
square design seems to be highly effective. 


10.5 GRAECO-LATIN SQUARE DESIGN 


We have seen that the Latin square design 1s effective for controlling two sources 
of external variation. The principle can be further extended to control more 
sources of variation. The Graeco-Latin square is one such design that can be used 
to control three sources of variation. The design is also useful for investigating 
simultaneous effects of four factors: rows, columns, Latin letters, and Greek 
letters, in a single experiment. The Graeco-Latin square design is obtained by 
juxtaposing or superimposing two Latin squares, one with treatments denoted 
by Latin letters and the other with treatments denoted by Greek letters, such 
that each Latin letter appears once and only once with each Greek letter. The 
designs have been constructed for all numbers of treatments* from 3 to 12. Some 
selected Graeco-Latin squares are shown in Figure 10.6. Some more examples 
are given in Appendix Y, Cochran and Cox (1957, pp. 146-147), and Fisher 
and Yates (1963, pp. 86-89). 


3x3 4x4 5x5 
Aa By CB Aa By Cé DB Aa By Ce DB E6 
BB Ca Ay BB Aé Dy Ca BB Cd Da Ey As 
Cy AB Ba Cy Da AB Bé Cy De EB Ad Ba 
Dd CB Ba Ay Dé Ea Ay Be CB 


Ee AB Bd Ca Dy 


FIGURE 10.6 Some Selected Graeco-Latin Squares. 


MODEL AND ANALYSIS 


The analysis of variance model for a Graeco-Latin square design is 


1,2,...,p 
=1,2,...,p 

Vijke = + Oj + Bj + Te + de + Cijne 12 D 
1,2 


y++-y DP, 


+ Graeco-Latin squares exist for all orders except 1, 2, and 6. The problem of nonexistence of 
Graeco-Latin squares for certain values of p goes back well over 200 years, when the Swiss 
mathematician Euler (1782) conjectured that no p x p Graeco-Latin square exists for p = 4m+2 
where m is a positive integer. In 1900, Euler’s conjecture was shown to be true for m = 1; that 
is, there does not exist a 6 x 6 Graeco-Latin square. However, his conjecture was shown to be 
false for m > 2 by Bose and Shrikhande (1959) and Parker (1959). 
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where y,;xe¢ is the observation corresponding to the i-th row, the j-th column, 
the k-th Greek letter, and the @-th Latin letter; —co < fz < o© Is the overall 
mean, @; is the effect of the i-th row, B; is the effect of the j-th column, Tt, is 
the effect of the k-th Greek letter, 5, is the effect of the 2-th Latin letter, and 
€;;x¢ iS the random error. The model assumes additivity of the effects of all four 
factors; that is, there are no interactions between rows, columns, Greek letters, 
and Latin letters. Furthermore, note that only two of the four subscripts i, /, k, 
and £ are needed to identify a particular observation. This is a consequence of 
each Greek letter appearing exactly once with each Latin letter and in each row 
and column. | 

The analysis of variance of the design is very similar to the Latin square 
design. The partitioning of the total sum of squares of the N = p? observations 
into components of rows, columns, Greek letters, Latin letters, and the error 1s 
given by 


SS7r = SSr + SSc + SSG + SS, + SSe, 


where 
Pp op p »p 
SS; = Sd > OniKe — yy, 
i=l] j=l k=1 @=1 
p 
SSp = Pp (Vi... — 5...) 
i=l 
p 
SSc = P (95. — 5...) 
j=l 
p 
SSgc = PY Ox —y_y, 
k=1 
p 
SSp =P) G.0- 3...) 
l=1 
and 


The corresponding degrees of freedom are partitioned as 


Greek Latin 
Total Rows Columns Letters Letters Error 
pP?>-1=(p—-1)+ (p—-)D +(p-)D+(p-)+(:p- Dip - 3) 
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TABLE 10.11 
Analysis of Variance for the Graeco-Latin Square Design 


Expected Mean Square* 


Source of Degrees of Sumsof Mean ——————————— 
Variation Freedom §$ Squares Square Model I Model ll =F Value 
p 
Row p-l SSe_ MSr_ of + —— ia? 02+ poz MSr/MSz 
P~* j=l 
Column p-1 SSc MSc oa? + — >> B? o2 + poz MSc/MSe 
p-—i* 
j=l 
2 P 2 22 2 
Greek Letter p-\ SSc MSc of +—— Ste 02+ po? MSc/MSe 
Pp— k=] 
p 
Latin Letter p—-1 SS; MS; o2+4+ —— \° 82 02+ po? MSz./MSz 
Pp— l=] 
Error (p-—1)(p—-3) SSe  MSgE oe? o2 
Total p> —1 SS7 


* The expected mean squares for the mixed model are not shown, but they can be obtained by 
replacing the appropriate term by the corresponding term as one changes from fixed to random 
effect; for example, replacing )-?_, a?/(p — 1) by of. 


The usual assumptions of the fixed effects model are: 


P P P 


di =D bi = DK = 


P 
dg =0 
i=] j=l k=1 é=1 


and the e;;x¢’s are normal random variables with mean zero and variance a2. 


Under the assumptions of the random effects model, the a;’s, B;’s, T;’s, and 
d¢’S are also normal random variables with mean zero and variances a2, a2, o?, 
o;, and a2, respectively. Other assumptions leading to a mixed model can also 
be made. Now, the expected mean squares are readily derived and the complete 
analysis of variance is shown in Table 10.11. The null hypotheses of equal 
effects for rows, columns, Greek letters, and Latin letters are tested by dividing 


the corresponding mean squares by the error mean square. 


Remark: The Graeco-Latin square design has not been used much because the exper- 
imental units cannot be easily balanced in all three groupings. Some early applications 
were described by Dunlop (1933) for testing 5 feeding treatments on pigs and by Tippett 
(1934) involving an industrial experiment. Perry et al. (1980) describe an application 
and advantages and disadvantages of the design in experiments for comparing different 
insect sex attractants. Discussions of the analysis of variance of the design when some 
observations are missing can be found in Yates (1933), Nair (1940), Davies (1960), and 
Dodge and Shah (1977). When p < 6, the number of error degrees of freedom is rather 
inadequate and the design is not practical (see Cochran and Cox (1957, p. 133)). 
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TABLE 10.12 
Data on Photographic Density for Different Brands of 
Flash Bulbs* 


The Analysis of Variance 


Camera 
Film 1 2 3 4 5 
1 0.64(Aq) 0.70(By) 0.73(Ce) 0.66(DB) 0.66 (E35) 
2 0.62(BB) 0.63(C5) 0.69(Da) O.70(Ey) 0.78 (Ae) 
3 0.65(Cy) 0.72(De) 0.68 (EB) 0.64 (Ad) 0.74 (Ba) 
4 0.64(Dé6) 0.73(Ea) O0.68(Ay) 0.74 (Be) 0.72 (CB) 
5 0.74(Ee) 0.73(AB) 0.67(B5) 0.74(Ca) 0.78 (Dy) 


Source: Johnson and Leone (1964, p. 175). Used with permission. 


* The original experiment reported duplicate measurements. Only the first 


set of readings are presented here. 


TABLE 10.13 
Analysis of Variance for the Data on Photographic 
Density for Different Brands of Flash Bulbs 


Source of Degreesof Sumsof Mean 

Variation Freedom Squares Square Fvalue p-value 
Film 4 0.00950 0.00237 7.18 0.010 
Camera 4 0.01558 0.00389 11.79 0.002 
Brand 4 0.00026 0.00006 0.18 0.936 
Filter 4 0.02398 0.00599 18.15 <0.001 
Error 8 0.00267 0.00033 

Total 24 0.05198 


WorRKED EXAMPLE 


Johnson and Leone (1964, p. 175) presented data from an experiment conducted 
to study the effect of different brands of flash bulbs on photographic density. 
A 5 x 5 Graeco-Latin square design with 5 varieties of cameras, 5 film types, 
and 5 filter types was used. The data are given in Table 10.12 where the Roman 
letter within parentheses represents the brand and the Greek letter represents 
the filter type. 

The analysis of variance calculations are readily performed and the results 
are shown in Table 10.13. The outputs illustrating the applications of statistical 
packages to perform analysis of variance are presented in Figure 10.7. There 
does not seem to be a significant effect of different brands of flash bulbs on 


DATA PHOTOGRPH; The SAS System 

INPUT FILM CAMERA BRAND Analysis of Variance Procedure 
$ FILTER $ DENSITY; 

CARDS; Dependent Variable: DENSITY 

11Aaqa 0.64 


oe ee . Sum of Mean 

55 Dy 0.78; Source DF Squares Square F Value Pr > F 
PROC ANOVA; 

CLASSES FILM CAMERA Model 16 0.04930400 0.00308150 9.23 0.0017 
BRAND FILTER; Error 8 0.00267200 0.00033400 


CAMERA BRAND FILTER; Total 
RUN; R-Square c.V. Root MSE DENSITY Mean 
CLASS LEVELS VALUES 0.948592 2.6243060 0.01827567 0.69640000 


FILM 5 
CAMERA 5 


2 

2 Source D Anova SS Mean Square F Value Pr > F 
BRAND 5 B 

B 

N 


4 

. FILM 0.00949600 0.00237400 7.11 0.0096 
5 CAMERA 0.01557600 0.00389400 11.66 0.0020 
BRAND 0.00025600 0.00006400 0.19 0.9361 
FILTER 0.02397600 0.00599400 17.95 0.0005 


FILTER 5 | 
NUMBER OF OBS. I 
SET=25 


E 
3 
3 
Cc 
yébe 
D 


(i) SAS application: SAS ANOVA instructions and output for the Graeco-Latin square 
design. 


DATA LIST Analysis of Variance--Design 1 

/FILM 1 CAMERA 3 

BRAND 5 Tests of Significance for DENSITY using UNIQUE sums of squares 
FILTER 7 

DENSITY 9-12(2). 
BEGIN DATA. Source of Variation ss Sig of F 
1111 0.64 

- ee . RESIDUAL .00 . 

5 543 0.78 FILM .O1 . . .010 
END DATA. CAMERA .02 . . .002 
MANOVA DENSITY BY BRAND .00 . . . 936 
FILM(1,5) FILTER .02 . . .000 
CAMERA (1,5) 
BRAND (1,5) (Model) 05 . . .002 
FILTER(1, 5) (Total) .05 

/DESIGN=FILM 

CAMERA BRAND R-Squared = -949 

FILTER. Adjusted R-Squared = .846 


(ii) SPSS application: SPSS MANOVA instructions and output for the Graeco-Latin 
square design. 


/ INPUT FILE='C: \SAHAI BMDP2V - ANALYSIS OF VARIANCE AND COVARIANCE WITH 
. \TEXTO\EJE26.TXT’. REPEATED MEASURES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. 
| VARIABLES=5. 
| /VARIABLE NAMES=F,C,B,FI,DENS. | ANALYSIS OF VARIANCE FOR THE 1-ST DEPENDENT VARIABLE 
/GROUP = VARIABLE=F,C,B, FI. 
CODES (F)=1,2,3,4,5. 
NAMES (F)=F1,...,F5. THE TRIALS ARE REPRESENTED BY THE VARIABLES: DENSITY 
CODES (C)=1,2,3,4,5. 
NAMES (C)=C1,..,C5. 
CODES (B)=1,2,3,4,5. SOURCE SUM OF D.F. MEAN 
NAMES (B) =B1,...,B5. SQUARES SQUARE 
CODES (FI)=1,2,3,4,5. 
NAMES (FI) =FI1,.., FIS. 
/DESIGN DEPENDENT=DENSITY. 12432 .12432 36300. 
INCLUDE=1, 2, 3, 4. .00950 -00237 7. 
/END 01558 00389 11. 
111 10.64 00026 00006 0. 
os ee . 02398 .00599 17. 
55 43 0.78 00267 00033 


(iii) BMDP application: BMDP 2V instructions and output for the Graeco-Latin square 
design. 


FIGURE 10.7 Program Instructions and Output for the Graeco-Latin Square 
Design: Data on Photographic Density for Different Brands of Flash Bulbs (Table 
10.12). 
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photographic density. The use of a Graeco-Latin square design in reducing 
variability due to varieties of camera, film types, and filter types seems to be 
highly effective. 


10.6 SPLIT-PLOT DESIGN 


Split-plot design can be considered as a special case of the two-factor random- 
ized block design where one wants to obtain more precise information about 
one factor and also about the interaction between the two factors, the second 
factor being of secondary importance to the experimenter. Thus, suppose there 
are two factors A and B having a and b levels, respectively. As described in 
the previous sections, one might use a completely randomized design by com- 
pletely randomizing the a x b treatment combinations, or a randomized block 
design (in, say, 7 randomized blocks), each block containing a x b plots. Alter- 
natively, suppose we wish to evaluate the effects of factor B and the interaction 
between the factors A and B with greater precision than the effects of factor A. 
In this situation, one could arrange the treatments of factor A in a randomized 
block design of r blocks as described earlier. Each of the a x r plots can then 
be divided into b subplots so that the treatments of factor B can now be allo- 
cated at random over each subplot. This design yields more precise information 
on the factor allocated to the split- or subplots at the expense of less precise 
information to the factor assigned to the whole-plots. 

As explained previously, the principal advantage of this type of design lies 
in the fact that since no attempt is being made to obtain an accurate information 
of factor A, larger plots can be used to allocate the first a treatments of factor A 
without any consideration of the variability within the blocks. Ifa = 3, b = 4, 
and r = 3, a split-plot design may be laid out as shown in Figure 10.8. 


Block II Block III 


| La) [B 
| Le [e. 
ss} [e| Le 


FIGURE 10.8 A Layout of a Split-Plot Design. 


Note that the essential feature of a split-plot design is that instead ofa x b xr 
experimental units obtained after random allocation over the entire a x b x r 
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units as in a completely randomized design, or obtained after r separate ran- 
domizations over a x b units, as in a simple randomized block design, they 
are obtained by first randomizing treatments of factor B on the b subplots (this 
randomization being performed a x r times) and then randomizing the treat- 
ments of factor A onto the a whole-plots (this randomization being performed 
r times, once for each of the r blocks). The name split-plot has its origin in 
agricultural experimentations where the terms whole-plots (large areas of land) 
and subplots (small areas of land) are in common use. 

In a split-plot design when there is a choice, the more important treatments 
requiring a higher level of precision should be assigned to the subplots and 
the treatments of secondary importance should be assigned to the whole-plots. 
However, in many industrial and laboratory experiments, the treatments that 
cannot be administered in small scale are applied to whole-plots and the treat- 
ments that can be conveniently applied to small scale are assigned to the sub- 
plots. This choice of a split-plot design is dictated purely by administrative and 
logistic considerations rather than the precision of the desired information. 


MODEL AND ANALYSIS 


The model for the split-plot design described previously is 
i=1,... 
Vijk = Mt B+ aj + ei; + Be + (BB)ik + QB) jn + Eijk YJ = 1,---,4 
k=1 


where yz is the general or overall mean, B; is the effect of the i-th block, a; is 
the effect of the j-th treatment of factor A, e;; 1s the whole-plot error; B; is the 
effect of the k-th treatment of factor B, (BB);, 1s the interaction between the 
i-th block and the k-th treatment of factor B, (a@B) x is the interaction between 
the j-th treatment of factor A and the k-th treatment of factor B, and é;;, is the 
subplot error. Note that e;; is the same as the (Ba);; interaction and &;;, is the 
same as the (BaB);;x interaction. 

Usually, the blocks are considered as random and factors A and B are fixed. 
Thus, the B;’s, (BB);x’s, e;;’s, and €;;,4’s are normally distributed with mean zero 
and variances 0, Op. o2, and o2, respectively. If both A and B are random, 
the a5, Bys, and (@B)jxs are assumed to be normally distributed with zero 
means and variances o2, Op. and o2,, respectively. Mixed models with A fixed 
and B random, or B fixed and A random can also arise and their assumptions 
are analogously stated. The analysis of variance is performed in exactly the 
same manner as before. Thus, the total sum of squares is partitioned by the 
identity 


SSr = SSge + SS4 + SSE + SSB t+ SSpexe + SSaxeptSSe, 
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where 


SS7r = — Ye von _ y.. y, 


i=1 j=l k= 


SSpe = ab SG: — ~_y, 
i=l 


SS, =rb (9 j.- 5.) 


j=l 


S86 = bY Yu, - LT SYi;. + 5), 


a 


SSp =ra er -— 5), 
k=1 


r b 
SSpexp =a > Six — Vi. — Vat yy, 


i=] k=] 


SSaxp =r yo jk —¥j. — Vat y), 


j=l k= 


and 

r a b 
SSe = S“(igk — 5 Vij. — Vie — Vj +54. AIH 5 
i=l j=l k=1 


The complete analysis of variance including the degrees of freedom and 
expected mean squares is shown in Table 10.14. When both A and B are fixed, 
we can test block and factor A effects against the whole-plot error. Similarly, 
Bé x B and A x B interactions can be tested against the subplot error. The 
factor B effect can be tested against the Bé x B interaction. Sometimes, the 
Bé x B interaction is also considered to be negligible and not included in 
the model. Then it is pooled with the subplot error, and the B main effect as 
well as A x B interaction are tested against the subplot error. Under Models 
II and III, however, exact tests may not always exist and psuedo-F tests as 
discussed in Section 5.5 would have to be employed. 


Remarks: (i) The split-plot technique may also be applied to a Latin square design. The 
a x a Latin square corresponds to the whole-plot treatments. Each whole-plot can be 
further subdivided into b subplots. Now, A treatments are applied randomly to whole- 
plots and B treatments are applied randomly to subplots within a whole-plot. Statistical 
analysis of such a design proceeds on lines similar to that of a randomized block. The 
first stage is an analysis of the a* whole-plots and the second stage is an analysis of the 
subplots within whole-plots. 

(ii) The problem of estimating missing values in a split-plot design has been studied 
by Anderson (1946) and Khargonkar (1948). Formulae for estimating the standard errors 
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of differences between two means involving missing values are given by Cochran and 
Cox (1957, pp. 302-303) and are also reported in Steel and Torrie (1980, pp. 388-390). 
A more complete description of this design can be found in the books by Cochran and 
Cox (1957, Chapter 7), Steel and Torrie (1980, Chapter 16), Fleiss (1986, Chapter 13), 
Damon and Harvey (1987, Chapter 7), Snedecor and Cochran (1989, pp. 324-329), and 
Hinkelman and Kempthorne (1994, Chapter 13). 


WorKED EXAMPLE 


Steel and Torrie (1980, p. 387) reported data from an experiment conducted by 
J. W. Lambert, at the University of Minnesota, to compare the effect of row 
spacing on the yields of two varieties of soybean. A split-plot design was used 
with a variety as a whole-plot, which was then divided into four subplots, and 
row spacing was applied to subplots. The varieties as whole-plot treatments 
were allocated in six blocks using a randomized complete block layout. The 
data on yields in bushels per acre for six blocks are given in Table 10.15. 

The analysis of variance computations are readily performed and the results 
are summarized in Table 10.16. The outputs illustrating the applications of 
statistical packages to perform the analysis of variance are presented in Figure 
10.9. In performing tests of significance, blocks and whole-plot error (block x 
variety interaction) are considered as random leading to expected mean squares 
shown in Table 10.16. We may conclude that there are highly significant dif- 
ferences due to both varieties and row spacings. No significant differences 
are found due to either blocks, or block x spacing and variety x spacing 
interactions. 


10.7, OTHER DESIGNS 


The designs described so far in this chapter are relatively simple, commonly 
used designs. There are a great number of other designs that differ mainly due 
to experimental conditions, such as limitations on resources, and the attempt 
to reduce the error variance. In this section, we briefly review some designs 
that are occasionally useful in scientific experimentation. Further details can 
be found in Kempthorne (1952), Federer (1955), Cochran and Cox (1957), and 
Das and Giri (1976). 


INCOMPLETE BLOCK DESIGNS 


In a randomized block design, each treatment must be present in every block. 
However, when there are too many treatments, it may not be possible to ac- 
commodate all factor levels or treatment combinations in each block because 
of limitations of the size of the block (amount of work or space) or lack of 
experimental resources. To overcome this problem, randomized block designs 
are used in which every treatment does not occur in every block. These designs 
are commonly known as incomplete block designs. There are several types of 
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TABLE 10.15 


Data on Yields of Two Varieties of Soybean 


Row 
Spacing (in.) OM 


18 33.6 
24 31.1 
30 33.0 
36 28.4 
42 31.4 


Variety * 


28.0 
23.7 
23.5 
25.0 
25.7 


2 
Variety 


Block 


3 
Variety 


4 5 


Variety Variety 


Source: Steel and Torrie (1980, p. 387). Used with permission. 
*OM = Ottawa Mandarin, B = Blackhawk. 


TABLE 10.16 
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6 
Variety 


Analysis of Variance for the Data on Yields of Two Varieties of 


Soybean 


Source of 
Variation 
Block 


Variety 


Whole-plot error 
(Block x Variety) 


Row spacing 


Block x Spacing 


Variety x Spacing 


Subplot error 
(Block x Variety 
x Spacing) 


Total 


5 


1 


20 


20 


59 


Degrees of Sums of 
Freedom Squares 


30.3588 
477.7082 


15.0388 


206.1043 


87.4137 


25.4543 


107.6237 


949.7018 


Mean Expected 
Square Mean Square F value 
6.0718 o2+502+2x 50% 2.019 
6x5 
477.1082 02 + 502 + — 158.823 
2 
x Yo az 
j=l 
3.0078 o7 + 502 0.559 
> > 6 x 
51.5261 of + 2op,+ =—> 11.789 
5 
x > Br; 
k=] 
4.3707 07 + 20%, 0.812 
6 
6.3636 of + ———-——__ 1.183 
(2 — 1)(5 — 1) 
2 5 
«DL OBYn 
j=l k=) 
5.3812 o2 


p-value 


0.230 
<0.001 


0.730 


<0.001 


0.677 


0.348 
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The SAS System 
INPUT BLOCK SPACING § General Linear Models Procedure 
VARIETY $ YIELD; Dependent Variable: YIELD 
DATALINES; Sum of Mean 
18" 33.6 Source DF Squares Square F Value Pr>F 
24” 31.1 Model 39 842.07816667 21.59174786 4.01 0.0008 
33.0 Error 20 107.62366667 5.38118333 
28.4 Corrected 59 949.70183333 
31.4 Total 
28.0 R-Square C.V. Root MSE YIELD Mean 
23.7 0.886676 8.2518685 2.3197378 28.11166667 
23.5 Source DF Type III SS Mean Square F Value Pr >F 
25.0 BLOCK - 358833 6.071767 1. -3776 
. VARIETY . 708167 477.708167 88. -0001 
22.9 SPACING - 104333 51.526083 . -0002 
BLOCK* VARIETY .038833 3.007767 . -7301 
BLOCK*S PACING -413667 4.370683 . - 6768 
1CLASSES BLOCK VARIETY*SPACING 4 - 454333 6.363583 . - 3486 
{VARIETY SPACING; Source Type III Expected Mean Square 
| MODEL YIELD=BLOCK BLOCK Var (Error) + 2 Var(BLOCK*SPACING)+5 Var (BLOCK*VARIETY) 
VARIETY SPACING + 10 Var (BLOCK) 
BLOCK* VARIETY VARIETY Var (Error) + 5 Var(BLOCK* VARIETY) 
SPACING* BLOCK + Q(VARIETY, VARIETY*SPACING) 
VARIETY*SPACING; SPACING Var (Error) + 2 Var (BLOCK*SPACING) 
RANDOM BLOCK BLOCK* + Q(SPACING, VARIETY*SPACING) 
VARIETY BLOCK*SPACING; | BLOCK*VARIETY Var (Error) + 5 Var(BLOCK*VARIETY) 
TEST H=BLOCK BLOCK* SPACING Var (Error) + 2 Var (BLOCK*SPACING) 
E=BLOCK*VARIETY; VARIETY* SPACING Var(Error) + Q(VARIETY*SPACING) 
TEST H=VARIETY Tests of Hypotheses using the Type III MS for BLOCK*VARIETY 
| E=BLOCK* VARIETY; as an error term 
} TEST H=SPACING Source DF Type III SS Mean Square F Value 
| E=BLOCK* SPACING BLOCK 5 30. 35883333 6.07176667 2.02 
| RUN; Tests of Hypotheses using the Type III MS for BLOCK*VARIETY 
| CLASS LEVELS VALUES as an error term 
BLOCK Source DF Type III SS Mean Square F Value 
VARIETY 1 477.70816667 477.7081666 158.82 


PRPRPeE RE PPP P 


VARIETY 2 B OM 
SPACING 5 18" 24" Tests of Hypotheses using the Type III MS for BLOCK*SPACING 
as an error term 
Source DF Type III SS Mean Square F Value 
SPACING 206.10433333 51.52608333 11.79 


DATA LIST 
/BLOCK 1 


Tests of Between-Subjects Effects Dependent Variable: YIELD 


SPACING 3-4 Source Type III SS df Mean F Sig. 
VARIETY 6 Square 
YIELD 8-11(1). BLOCK Hypothesis 30.359 5 6.072 3.040 .421 
EGIN DATA. Error 1.891 -947 1.997 (a) 
1 33.6 VARIETY Hypothesis 477.708 1 477.708 158.825 .000 
1 31.1 Error 15.039 5 3.008 (b) 
1 33.0 SPACING Hypothesis 206.104 4 51.526 11.789 .000 
1 28.4 Error 87.414 20 4.371 (c) 
131.4 BLOCK* Hypothesis 87.414 20 4.371 -812 .677 
2 28.0 SPANCING Error 107.624 20 5.381 (d) 
2 23.7 BLOCK* Hypothesis 15.039 5 3.008 -559 .730 
2 23.5 VARIETY Error 107.624 20 5.381 (d) 
2 25.0 SPACING* Hypothesis 25.454 4 6.364 1.183 .349 
2 25.7 VARIETY Error 107.624 20 5.381 (d) 
1 37.1 a MS(B*S)+MS(B*V)-MS(E) b MS(B*V) c MS(B*S) d MS(Error) 
1 34.5 
1 29.5 Expected Mean Squares (a,b) 
1 29.9 Variance Component 
1 Source Var(B) Var(B*S) Var(B*V) Var(Error) Quadratic Term 


BLOCK 10.000 2.000 5.000 1.000 

VARIETY .000 .000 5.000 1.000 Variety 
SPACING .000 2.000 .000 1.000 Spacing 
BLOCK* SPACING 000 2.000 .-000 1.000 

BLOCK* VARIETY .000 . 000 5.000 1.000 

SPACING* VARIETY .000 . 000 .000 1.000 Variety*Spacing 


fGLM YIELD BY BLOCK 
}SPACING VARIETY 
/DESIGN=BLOCK 


VARIETY SPACING Error .000 -000 .000 1.000 

BLOCK*S PACING a For each source, the expected mean square equals the sum of the 
BLOCK* VARIETY coefficients in the cells times the variance components, plus a| 
SPACING* VARIETY quadratic term involving effects in the Quadratic Term cell. b Expected 


| /RANDOM BLOCK. Mean Squares are based on the Type III Sums of Squares. 


(ii) SPSS application: SPSS GLM instructions and output for the split-plot design. 


FIGURE 10.9 Program Instructions and Output for the Split-Plot Design: Data 
on Yields of Two Varieties of Soybean (Table 10.15). 
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FILE='C: \SAHKAI BMDP8V - GENERAL MIXED MODEL ANALYSIS OF VARIANCE 
\TEXTO\EJE27.TXT’. - EQUAL CELL SIZES Release: 7.0 (BMDP/DYNAMIC) 
FORMAT=FREE. ANALYSIS OF VARIANCE FOR DEPENDENT VARIABLE 1 
VARIABLES=5. 
NAMES=S1,52,S3,S4, SOURCE ERROR SUM OF D.F. MEAN F PROB. 
55. TERM SQUARES SQUARE 
NAMES=BLOCK, MEAN BLOCK 47415.9477 1 47415.948 7809.25 0.0000 
VARIETY, BLOCK -3588 5 6.072 
SPACING. VARIETY -7082 1 477.708 158.83 0.0001 
, LEVELS=6, 2, 5. SPACING -1043 4 51.526 11.79 0.0000 
RANDOM=BLOCK. .0388 5 3.008 
FIXED=VARIETY, -4137 2 4.371 
SPACING. -4543 4 6.364 1.18 0.3486 
MODEL='B, V, S'. -6237 2 5.381 


@OArnNnU &®WN eH 


33.0 28.4 31.4 EXPECTED MEAN ESTIMATES OF VARIANCE 
23.5 25.0 25.7 SQUARE COMPONENTS 
60(1)+10(2) 790.16460 
10(2) 0.60718 
30(3)+5(5) 15.82335 
12 (4) +2 (6) .92962 
5 (5) .60155 
2(6) .18534 
6(7)+(8) .16373 
.38118 


136.1 30.3 27.9 26.9 33.4 
28.3 23.8 22.0 24.5 22.9 

| ANALYSIS OF VARIANCE DESIGN 
| BV Ss 
| NUMBER OF LEVELS 2 5 
POPULATION SIZE INF 2 5 
MODEL B, V, S 


OArxAnNU 2®WNDN He 


FIGURE 10.9 (continued) 


incomplete block designs, the simplest of which involve blocks of equal size 
and all treatments equally replicated. If an incomplete block design has t treat- 
ments, b blocks with c experimental units within each block and there are r 
replications of each treatment, then the number of times any two treatments 
appear together in a block is A = r(c — 1)/(t — 1) = n(c — 1)/t(t — 1) where 
n = tr. When it is desired to make all treatment comparisons with equal preci- 
sion, the incomplete block designs are formed such that every pair of treatments 
occurs together the same number of times. Such designs are called balanced 
incomplete block designs and were originally proposed by Yates (1936a). For 
a list of some useful balanced incomplete block designs, see Box et al. (1978, 
pp. 270-274). Balanced incomplete block designs do not always exist or may 
result in excessively large block sizes. To reduce the number of blocks required 
in an experiment, the experimenter can employ designs known as partially 
balanced incomplete block designs in which different pairs of treatments ap- 
pear together a different number of times. For further discussions of incomplete 
block designs, see Cochran and Cox (1957, Chapters 9 and 13) and Cox (1958a, 
pp. 231-245). 


LATTICE DESIGNS 


Lattice designs are a class of incomplete block designs introduced by Yates 
(1937b) to increase the precision of treatment comparisons in agricultural 
crop cultivate trials. The designs are also sometimes called quasi-factorials 
because of their analogy to confounding in factorial experiments. For example, 
if k? treatments are to be compared, one can arrange them as the points of a 
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two-dimensional lattice and regard the points as representing the treatments 
in a two-factor experiment. Suppose a balanced incomplete block layout with 
k? treatment is arranged in b = k(k + 1) blocks with k units per block and 
r =k + 1 replicates for each treatment. Such a design is called a balanced lat- 
tice. In a balanced lattice, the number of treatments 1s always an exact square 
and the size of the block is the square root of this number. Incomplete lattice 
designs are grouped to form separate replications. In a balanced incomplete 
lattice, every pair of treatments occurs once in the same incomplete block. This 
allows the same degree of precision for all treatment pairs being compared. Lat- 
tice designs may involve a large number of treatments and 1n order to reduce 
the size of the design, partially balanced lattice designs are also used. Further 
details of the lattice designs are given in Kempthorne (1952), Federer (1955), 
and Cochran and Cox (1957). The SAS PROC LATTICE performs the analysis 
of variance and analysis of simple covariance using experimental data obtained 
from a lattice design. The procedure analyzes data from balanced square lat- 
tices, partially balanced square lattices, and some other rectangular lattices. 
For further information and applications of PROC LATTICE, see SAS Institute 
(1997, Chapter 14). 


YOUDEN SQUARES 


Youden squares are constructed by a rearrangement of certain of the balanced 
incomplete block designs and possess the property of “two-way control’ of 
Latin squares. They are special types of incomplete Latin squares in which the 
number of columns, rows, and treatments are not all equal. If a column or row 
is deleted from a Latin square, the remaining layout is always a Youden square. 
However, omission of two or more rows or columns does not in general produce 
a Youden square. Youden squares can also be thought of as symmetrically 
balanced incomplete block designs by means of which two sources of variation 
can be controlled. These designs were developed by Youden (1937, 1940) in 
investigations involving greenhouse experiments. The name Youden square was 
given by Yates (1936b). The standard analysis of variance of a Youden square 
design is similar to that of a balanced incomplete randomized block design. 
A detailed treatment of planning and analysis of Youden squares is given in 
Natrella (1963, Section 13.6). A table of Youden squares is given in Davies 
(1960) and other types of incomplete Latin squares are discussed by Cochran 
and Cox (1957, Chapter 13). 


CROSS-OVER DESIGNS 


In most experimental designs, each subject is assigned only to a single treatment 
during the entire course of the experiment. In a cross-over design, the total dura- 
tion of the experiment is divided into several periods and the treatment of each 
subject changes from each period to the next. In a cross-over study involving 
k treatments, each treatment is allocated to an equal number of subjects and is 
applied to each subject in k different time periods. Since the order of treatment 
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assignment to experimental units may have some consequences regarding the 
effectiveness of different treatments, the order of treatment is chosen randomly 
so as to eliminate the order effects. This type of design is particularly suited 
for animal and human subjects. The intervening period between the assignment 
of different treatments depends on the objectives of the experiment and other 
experimental considerations. For example, suppose in an experiment involving 
human subjects, the effect of two treatments is investigated. In the first period, 
half of the subjects are randomly assigned to treatment 1 and the other half to 
treatment 2. At the end of the study period, the subjects are evaluated for the 
desired response and sufficient time is allowed so that the biological effect of 
each treatment is eliminated. In the second period, the subjects who were as- 
signed treatment 1 are given treatment 2 and vice versa. The cross-over designs 
can be analyzed as a set of Latin squares with rows as time periods, columns as 
subjects, and treatments as letters. Cross- over designs have been used success- 
fully in clinical trials, bioassay, and animal nutrition experiments. For further 
discussions of cross-over designs see Cochran and Cox (1957, Section 4.4), Cox 
(1958a, Chapter 13), John (1971, Chapter 6), John and Quenouille (1977, Chap- 
ter 11), Fleiss (1986, Chapter 10), Jones and Kenward (1989), Senn (1993), and 
Ratkowski et al. (1993). 


REPEATED MEASURES DESIGNS 


Any design involving k (k < 2) successive measurements on the same subject 
is called a repeated measures design. In a repeated measures design, subjects 
are crossed with the factor involving repeated measures. The k measurements 
may correspond to different times, trials, or experimental conditions. For ex- 
ample, blood pressures may be measured at successive time periods, say, once 
a week, for a group of patients attending a clinic, or animals injected with dif- 
ferent drugs and measurements made after each injection. If possible, the order 
of assignment of k repeated measures should be selected randomly. Of course, 
when repeated measures are taken in different time sequences, it is not possible 
to include randomization. In repeated measures designs, each subject acts as 
his or her own control. This helps to control for variability between subjects 
since the same subject is measured repeatedly. Thus, repeated measures designs 
are used to control for the presence of many extraneous factors while at the same 
time limiting the total number of experimental units. A major concern in re- 
peated measures designs are that no carry-over or residual effects are present 
from treatment at one time period to response at the next time period. Thus, 
as in the case of cross-over designs, sufficient time must be allowed to elim- 
inate any carry-over effect from the previous treatment. When this cannot be 
achieved, cross-over designs are to be preferred. It is important to point out 
that it is incorrect to analyze the time dimension in repeated measures studies 
by the straightforward application of the analysis of variance. For a complete 
coverage of repeated measures designs, see Fleiss (1986, Chapter 8), Maxwell 
and Delaney (1990), Winer et al. (1991), and Kirk (1995). For a book-length 
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treatment of the topic see Crowder and Hand (1990) and Lindsey (1993). The lat- 
ter work also includes a fairly extensive and classified bibliography on repeated 
measures. Hedayat and Afsarinejad (1975, 1978) have given an extensive sur- 
vey and bibliography on repeated measures designs. For analysis of repeated 
measures data using SAS procedures, PROC GLM and PROC MIXED, see 
Littell et al. (1996, Chapter 3). 


HyYPER-GRAECO-LATIN AND HYPER SQUARES 


The principle of Latin and Graeco-Latin square designs can be further extended 
to control for four or more sources of variation. Hyper-Graeco-Latin square is 
a design which can be used to control four sources of variation. The design can 
also be used to investigate simultaneous effects of five factors: rows, columns, 
Latin letters, Greek letters, and Hebrew letters, in a single experiment. The 
hyper-Graeco-Latin square design is obtained by juxtaposing or superimposing 
three Latin squares, one with treatments denoted by Greek letters, the second 
with treatments denoted by Latin letters, and the third with treatments denoted 
by Hebrew letters, such that each Hebrew letters appears once and only once 
with each Greek and Latin letters. The number of Latin squares that can be com- 
bined in forming hyper-Graeco-Latin squares is limited. For example, no more 
than three orthogonal 4 x 4 Latin squares can be combined and no more than 
four orthogonal 5 x 5 Latin squares can be combined. The sum of squares for- 
mulae for rows, columns, Greek letters, Latin letters, and Hebrew letters follow 
the same general pattern as the corresponding formulae in Latin and Graeco- 
Latin square designs. The concept of superimposing two or more orthogonal 
Latin squares in forming Graeco-Latin and hyper-Graeco-Latin squares can be 
extended even further. A p x p hypersquare is a design in which three or more 
orthogonal p x p Latin squares are superimposed. In general, one can investigate 
a maximum of p+ 1 factors if acomplete set of p — 1 orthogonal Latin squares 
is available. In such a design, one would utilize all (p + 1)(p — 1) = p’—-1 
degrees of freedom, so that an independent estimate of the error variance would 
be required. Of course, the researcher must assume that there would be no in- 
teractions between factors when using hypersquares. For a detailed discussion 
of hyper-Graeco-Latin squares, and other hypersquares, see Federer (1955). 


MAGIC AND SUPER MAGIC LATIN SQUARES 


These are Latin square designs with additional restrictions placed on the group- 
ing of treatments within a Latin square in order to reduce the error term. For 
this purpose, additional smaller squares or rectangles are formed within a Latin 
square in order to remove additional variation from the error term. If the use of 
squares or rectangles to remove variation is done in only one direction, the de- 
sign is called a magic Latin square. If the technique is used to control variation 
in both directions, the design is a called super magic Latin square. These designs 
were initially developed by Gertrude M. Cox and have been used in sugarcane 
research in Hawaii and at the Geneva Experimental Station in New York. 
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SPLIT-SPLIT-PLOT DESIGN 


In a split-plot design, each subplot may be further subdivided into a number 
of sub-subplots to which a third set of treatments corresponding to c levels 
of a factor C may be applied. In a split-split-plot design, three factors are 
assigned to the various levels of experimental units, using three distinct stages 
of randomization. Thea levels of factor A are randomly assigned to whole-plots; 
b levels of factor B are randomly assigned to subplots within a whole-plot; and 
c levels of factor C are randomly assigned to sub-subplots within a subplot. 
For such a design there will be three error variances: whole-plot error for the 
A treatments, subplot error for the B treatments, and sub-subplot error for 
the C treatments. The details of statistical analysis follow the same general 
pattern as that of the split-plot design. Finally, it should be noted that in a 
split-split-plot design, the three error sums of squares and their corresponding 
degrees of freedom would add up to the sum of squares and the degrees of 
freedom for the single error term if the experiment were conducted in a standard 
randomized block design of abc units. For further information about the split- 
split-plot design, see Anderson and McLean (1974, Section 7.2) and Koch et al. 
(1988). 


2? DESIGN AND FRACTIONAL REPLICATIONS 


In many experimental works involving a large number of factors, a very useful 
factorial design for preliminary exploration is a 2? design. This design has 
p treatment factors, each having two levels, giving a total of 2? treatment 
combinations. Thus, in any replication of this design, 2? experimental units are 
required. In a 2? design, there are p main effects, (5) two-way interactions, (3) 
three-way interactions, etc., and finally one p-way interaction. Note that all the 
main effects and each one of the interactions have only one degree of freedom. 
If the design is replicated in b blocks each containing 2? experimental units, 
then there are (2? — 1)(b — 1) degrees of freedom available for the error term. If 
the number of experimental units available is limited, it may not be possible to 
replicate the design. In such acase there will ble no degrees of freedom available 
for the error term. However, if the higher-order interactions can be assumed to 
be negligible, which is often the case, one can pool the sums of squares for these 
interactions in order to obtain an estimate of the error mean square. If some 
of the higher-order interactions are not zero, the F test for the main effects 
and the lower-order interactions will tend to be conservative. If there are large 
number of treatment factors and the available resources are limited, it may be 
necessary to use a replication of only a fraction of the total number of treatment 
combinations. In a design involving a fractional replication, some of the effects 
cannot be estimated since they are confounded with one or more other effects. 
Usually, the choice of a fractional replication is made such that the effects 
considered to be of importance are confounded only with the effects that can be 
assumed to be negligible. For a complete discussion of 2? and other factorial 
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designs, and their fractional replications, see Kempthorne (1952), Cochran and 
Cox (1957), and Box et al. (1978). 


10.8 USE OF STATISTICAL COMPUTING PACKAGES 


Completely randomized designs (CRD) can be analyzed exactly as the one-way 
analysis of variance. The use of SAS, SPSS, and BMDP programs for this anal- 
ysis is described in Section 2.15. Similarly, randomized block designs (RBD) 
can be analyzed exactly as the two-way analysis of variance with n(n > 1) 
observations per cell. The use of appropriate programs for this type of analysis 
is described in Sections 3.17 and 4.17. Latin squares (LSD) and Graeco-Latin 
squares (GLS) can be analyzed similar to three-way and four-way crossed- 
classification models without interactions. For example, with SAS, one can use 
either PROC ANOVA or PROC GLM for all of them. The important instruc- 
tions for both procedures are the CLASS and the MODEL statements. For the 
CRD, these are 


CLASS TRT; 
MODEL Y = TRT; 


and for the RBD, these are 


CLASS BLC TRI; 
MODEL Y = BLC TRT; 


where TRT, BLC, and Y designate treatment, block, and response, respectively. 
Similarly, for the LSD, we have 


CLASS ROW COL TRI; 
MODEL Y = ROW COL TRT; 


and, for the GLS, we have 


CLASS ROW COL GRG ROM; 
Y = ROW COL GRG ROM; 


where ROW, COL, GRG, and ROM designate row, column, Greek letter, and 
Roman letter factors, respectively. For the split-plot design, these statements 
are 


CLASS BLC A B 
MODEL Y = BLC A BLC*A BBLC*B A*B; 


where BLC stands for blocks (replications) and A and B are whole-plot and 
subplot treatments, respectively. 
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If some of the factors are to be treated as random and the researcher is 
interested in estimating variance components, one may employ appropriate 
procedures in SAS, SPSS, and BMDP for this purpose. For example, in a Latin 
square with rows and columns regarded as random factors, the following SAS 
codes may be used to analyze the design via PROC MIXED procedure: 


PROC MIXED; 
CLASS ROW COL TRT; 
MODEL Y = TRT; 
RANDOM ROW COL; 
RUN; 


EXERCISES 


1. An experiment was designed to compare four different feeds in regard 
to the gain in weight of cattle. Twenty cattle were divided at random 
into four groups of five each and each group was placed on a different 
feed. After a certain duration of time, the weight gains in kilograms 
for each of the cattle was recorded and the data given as follows. 


Feed A Feed B-  FeedC Feed D 


34 64 111 96 
45 49 34 85 
35 52 122 91 
49 47 27 88 
44 58 29 94 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
weight gains for all the feed are the same. Use a = 0.05. 

(d) If the hypothesis in part (c) is rejected, find 95 percent simulta- 
neous confidence intervals for the contrasts between each pair 
of feeds using Tukey’s and Scheffé’s methods. 

(e) Carry outthe test for homoscedasticity ata = 0.01 by employing 


(i) Bartlett’s test, 
(ii) Hartley’s test, 
(111) Cochran’s test. 


2. Three methods of teaching were compared to determine their compar- 
ative value on student’s learning ability. Thirty students of comparable 
ability were randomly divided into three groups of 10 each, and each 
group received instruction using a different method. After completion 
of instruction, the learning score for each student was determined and 
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the data given as follows. 


Method A Method B Method C 


161 179 134 
131 261 176 
186 311 153 
281 176 186 
213 196 131 
155 163 131 
221 221 157 
167 232 164 
19] 264 175 
216 259 133 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
learning scores for the three methods are the same. Use a = 0.05. 

(d) If the hypothesis in part (c) is rejected, find 95 percent simulta- 
neous confidence intervals for the three single contrasts between 
each pair of teaching methods using Tukey’s and Scheffé’s pro- 
cedures. Use a = 0.01. 

(e) Carry out the test for homoscedasticity ata = 0.01 by employing 


(1) Bartlett’s test, 
(ii) Hartley’s test, 
(11) Cochran’s test. 


Steel and Torrie (1980, p. 144) reported data from an experiment, con- 
ducted by F. R. Urey, Department of Zoology, University of Wisconsin, 
on estrogen assay of several solutions that had been subjected to an 
in vitro inactivation technique. Twenty-eight rats were randomly as- 
signed to six different solutions and a control group and the uterine 
weight of the rat was used as a measure of the estrogen activity. The 
uterine weight in milligrams for each rat was recorded and the data 
given as follows. 


1 2 3 4 5 6 Control 
84.4 64.4 75.2 88.4 56.4 65.6 89.8 

116.0 79.8 62.4 90.2 83.2 79.4 93.8 
84.0 88.0 62.4 73.2 90.4 65.6 88.4 
68.6 69.4 73.8 87.8 85.6 70.2 112.6 


Source: Steel and Torrie (1980, p. 144). Used with permission. 
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(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
uterine weights for all the groups are the same. Use a = 0.05. 

(d) Ifthe hypothesis in part (c) is rejected, find 95 percent confidence 
intervals for the single contrast comparing control with the mean 
of all the other six treatments and interpret your results. Use 
a = 0.05. 

(e) Carry out the test for homoscedasticity ata = 0.01 by employing 


(i) Bartlett’s test, 
(11) Hartley’s test, 
(111) Cochran’s test. 


4. Fisher and McDonald (1978, p. 45) reported data from an experiment 
designed to study the effect of experience on errors in the reading of 
chest x-rays. Ten radiologists participated in the study and were clas- 
sified into one of three groups: senior staff, junior staff, and residents. 
Each radiologist was asked whether the left ventricle was normal and 
the response was compared to the results of ventriculography. The 
percentage of errors for each radiologist was determined and the data 
given as follows. 


Senior Staff —_ Junior Staff Residents 

7.3 13.3 14.7 

7.4 10.6 23.0 
15.0 22.7 
20.7 26.6 


Source: Fisher and McDonald (1978, p. 46). 
Used with permission. 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
percentage errors for the three groups of radiologists are the 
same. Use a = 0.05. 

(d) Transform each percentage to its arcsine value and then perform 
a second analysis of variance on the transformed data, comparing 
the results with those obtained in part (c). 

5. Lorenzen and Anderson (1993, p. 46) reported data from an experiment 

designed to study the effect of honey on haemoglobin in children. A 

completely randomized design was used with 12 children, 6 given a 
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tablespoon of honey added to a cup of milk, and 6 not given honey 
over a period of six straight weeks. The data are given as follows. 


Honey Control 
19 14 
12 8 

9 4 
17 4 
24 11 
22 15 


Source: Lorenzen and Anderson 
(1993, p. 46). Used with 
permission. 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
haemoglobin levels in two groups of children are the same. Use 
a = 0.05. 

(d) Carry out the test for homoscedasticity using the Snedecor’s F 
test. Use a = 0.05. 

6. A randomized block experiment was conducted with five treatments 
and five blocks. The following table gives partial results on analysis 
of variance. 


Source of Degreesof Sumof Mean 


Variation Freedom Squares Square FValue p-Value 
Treatment 150.2 
Block 49.1 
Error — 
Total 295.2 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Complete the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that all the 
treatment means are the same. Use a = 0.05. 

7. An experiment is designed to compare mileage of four brands of gaso- 
line. Inasmuch as mileage will vary according to road and other driving 
conditions, five different categories of driving conditions are included 
in the experiment. A randomized block design is used and each brand 
of gasoline is randomly selected to fill five cars. Finally, each car is 
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randomly assigned a given driving condition. The data are given as 


follows. 


Driving 


condition B, 


D, 
D2 
D3 
Dg 
Ds 


(a) Describe the mathematical model and the assumptions for the 


experiment. 


(b) Analyze the data and report the analysis of variance table. 
(c) Do the brands have a significant effect on the mileage? Use 


a = 0.05. 


(d) Do the driving conditions have a significant effect on mileage? 


Use a = 0.05. 


(e) If there are significant differences in mileage due to brands, use 
a suitable multiple comparison procedure to determine which 
brands differ. Use w = 0.01. 


48.1 
34.6 
47.6 
30.6 
39.6 


Brand of Gasoline 


By 


39.1 
46.1 
48.6 
43.6 
45.1 


By 


51.6 
45.6 
41.6 
39.6 
44.1 


Bg 


49.1 
35.1 
39.6 
31.6 
21.1 


8. An agricultural experiment was designed to study the potential for 
grain yield of four different varieties of wheat. A randomized block 
design with five blocks was used and each variety was planted in each 


of the blocks. The data on yields are given as follows. 


Block 


nN & WN = 


(a) Describe the mathematical model and the assumptions for the 


experiment. 


(b) Analyze the data and report the analysis of variance table. 
(c) Do the varieties have a significant effect on the yield? Use 


a = 0.05. 


(d) Do the blocks have a significant effect on the yield? Use a 


0.05. 


(e) If there are significant differences in yields due to varieties, use 
a suitable multiple comparison method to determine which va- 


152.4 
154.1 
154.4 
155.1 
156.5 


rieties differ. Use aw = 0.01. 


Variety 
il Wil 
153.4 150.9 
152.8 154.4 
156.3 155.3 
156.1 152.3 
154.6 155.2 


IV 


145.5 
146.1 
149.8 
148.1 
148.9 
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9. An experiment was designed to study the reaction time among rats 


10. 


under the influence of three different treatments. Four rats were chosen 
for the experiment and three treatments were administered on each rat 
on three different days, and the order in which each rat received a 
treatment was random. The data on reaction time in seconds are given 
as follows. 


Treatment 


Rat A B C 


6.8 115 84 
5.6 97 7.3 
2.2 74 3.9 
3.7 8.3 5.7 


be Wh = 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Do the treatments have a significant effect on the reaction time? 
Use a = 0.05. 

(d) Do the rats have a significant effect on the reaction time? Use 
a = 0.05. 

(e) If there are significant differences in reaction time due to treat- 
ments, use a suitable multiple comparison method to determine 
which treatments differ. Use a = 0.01. 

Anderson and Bancroft (1952, p. 245) reported data from an exper- 

iment conducted by Middleton and Chapman at Laurinburg, North 

Carolina, to compare eight varieties of oats. The experiment involved 

a randomized block design with five blocks and the yields of grain in 

grams for a 16-foot row were recorded. The data are given as follows. 


Variety 


Block I i tl IV Vv Vi Vil vill 


296 402 437 303 469 345 324 488 
357 390 334 319 405 342 339 374 
340 431 426 310 442 358 357 401 
331 340 320 260 487 300 352 338 
348 320 296 242 394 308 220 320 


na & WN = 


Source: Anderson and Bancroft (1952, p. 245). Used with permission. 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
yields for all the varieties are the same. Use a = 0.05. 
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(d) If there are significant differences in mean yields among the var1- 
eties, use a suitable multiple comparison procedure to determine 
which varieties differ. Use a = 0.01. 

(e) What is the efficiency of this design compared with a completely 
randomized design? 

11. Fisher and McDonald (1978, p. 66) reported data from an experiment 
designed to study the effect of different heat treatments of the dietary 
protein of young rats on the sulfur-containing free amino acids in 
the plasma. The experiment involved a randomized block design with 
three blocks and six different treatments of heated soybean protein. 
The plasma-free crystine levels in rats fed on different treatments were 
recorded (4 moles/100 m2) and the data are given as follows (where 
each observation is the average for four rats). 


Heat Treatment 


Block I i il IV Vv Vi 


4.0 4.0 4.1 3.8 4.5 3.8 
4.6 5.7 5.2 4.9 5.6 5.3 
49 6.1 5.4 5.2 5.9 5.7 


Source: Fisher and McDonald (1978, p. 66). Used with 
permission. 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
levels of plasma-free crystines in rats fed on different treatments 
are the same. Use a = 0.05. 

(d) If there are significant differences in mean levels of plasma-free 
crystines due to treatments, use a suitable multiple comparison 
method to determine which treatments differ. Use a = 0.01. 

(e) What is the efficiency of this design compared with a completely 
randomized design? 

12. John (1971, p. 64) reported data from an experiment involving a ran- 
domized block design with three blocks and 12 treatments including a 
control. The yields, in ounces, of cured tobacco leaves were recorded 
and the data are given as follows. 


Treatment 
Block | Il Wl IV Vv vi s=OVU vill IX D4 »¢| Control 
1 76 82 76 70 $76 #70 82 88 81 74 #67 79 
2 70 70 73 74 # 73 83 74 65 67 67 67 78 


80 73 77 62 86 84 80 80 81 76 79 63 


Source: John (1971, p. 64). Used with permission. 


532 


13. 


14. 


The Analysis of Variance 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
yields for all the treatments are the same. Use a = 0.05. 

(d) If the hypothesis in part (c) is rejected, use Dunnett’s procedure 
to test differences between the control and each of the other 
treatment means. Use a = 0.01. 

(ec) As an alternative to Dunnett’s procedure, one might wish to 
compare control versus the mean of the other 11 treatments. Set 
up the necessary contrast and test the implied null hypothesis. 
Use a = 0.01. 

John and Quenouille (1977) reported data from a randomized block 

experiment to test the efficacy of five levels of application of potash on 

the Pressley strength index of cotton. The levels of potash consisted of 
pounds of KO per unit area, expressed as units, and the experiment 
was carried out in three blocks. The data are as follows. 


Treatment 
Block | HH Wl IV Vv 
1 7.62 8.14 7.76 7.17 7.46 
2 8.00 8.15 7.73 7.57 7.68 
3 7.93 7.87 7.74 7.80 7.21 


Source: John and Quenouille (1977). Used with 
permission. 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Do the treatments have a significant effect on the strength of 
cotton? Use a = 0.05. 

(d) Do the blocks have a significant effect on the strength of cotton? 
Use a = 0.05. 

(e) If there are significant differences in mean levels of the Pressley 
index of cotton, use a suitable multiple comparison method to 
determine which treatments differ. Use a = 0.01. 

Snee (1985) reported data from an experiment designed to investi- 

gate the effect of a drug added to the feed of chicks on their growth. 

There were three treatments: standard feed (control group), standard 

feed and a low dose of drug, standard feed and a high dose of drug. 

The experimental units included a group of chicks fed and reared in 

the same bird house. Eight blocks of three experimental units each were 

laid out with physically adjacent units assigned to the same block. The 
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data are given in the following where each observation is the average 


weight (Ibs) per bird at maturity. 


Block 


On nA oT h& WwW NH = 


Source: Snee (1985). Used with permission. 


(a) Describe the mathematical model and the assumptions for the 


experiment. 


(b) Analyze the data and report the analysis of variance table. 
(c) Perform an appropriate F test for the hypothesis that the mean 
weights of chicks fed on different treatments are the same. Use 


a = 0.05. 


(d) If there are significant differences in mean weights due to treat- 
ments, use a Suitable multiple comparison method to determine 
which treatments differ. Use a = 0.01. 

(e) What is the efficiency of this design compared with a completely 


Control 


3.93 
3.78 
3.88 
3.93 
3.84 
3.75 
3.98 
3.84 


randomized design? 
Steel and Torrie (1980, p. 202) reported unpublished data, courtesy 
of R. A. Linthurst and E. D. Seneca, North Carolina State University, 
Raleigh, North Carolina (paper title, “Aeration, Nitrogen, and Salin- 
ity as Determinants of Spartina alterniflora Growth Response,), who 
conducted a greenhouse experiment on the growth of Spartina alterni- 
flora in order to study the effects of salinity, nitrogen and aeration. The 
dried weight of all aerial plant material was recorded and the data are 


given as follows. 


Block I il lil 

1 11.8 18.8 21.3 
2 8.1 15.8 22.3 
3 22.6 37.1 19.8 
4 4.1 22.1 49.0 


IV 


83.3 
25.3 
55.1 
47.6 


Vv 


8.8 
8.1 
2.1 
10.0 


Treatment 


3.99 
3.96 
3.96 
4.03 
4.10 
4.02 
4.06 
3.92 


Low dose 


Treatment* 
Vi Vil 
26.2 20.4 
19.5 8.5 
17.8 8.2 
20.3 4.8 


* Treatment combinations are defined as follows. 


High dose 


3.96 
3.94 
4.02 
4.06 
3.94 
4.09 
4.17 
4.12 


Vill IX 
50.2. 2.2 
47.7 = 3.3 
16.4 11.1 
25.8 = 2.7 


15.3 
10.2 
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Treatment Code 


Number I ut il IV Vv vi Vit Vill xX xX Xi XI 


Salinity IS 15 15 15 30 30 30 30 45) = 45 45 45 
parts/thousand 

Nitrogen 0 oO 168 168 OO O 168 168 0 O 168 168 
kg/hectare 

Aeration 0 | 0 | 0 1 0 | 0 1 0 1 
(0 = none, 


1 = saturation) 


Source: Steel and Torrie (1980, p. 202). Used with permission. 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that there are 
no differences in response due to treatments. Use a = 0.05. 

(d) Perform an appropriate F test for the hypothesis that there are 
no differences in response due to differences in salinity. Use 
a = 0.05. 

(e) Perform an appropriate F test for the hypothesis that there are no 
differences in response due to differences in nitrogen treatments. 
Use a = 0.05. 

(f) Perform an appropriate F test of the hypothesis that there are no 
differences in response due to differences in aeration treatments. 
Use a = 0.05. 

(g) Is the nitrogen contrast orthogonal to the aeration contrast? 
A researcher wants to study two treatment factors A and C using a 
factorial arrangement along with a randomized block design. Assume 
that the factors A and C have a and c levels respectively giving a total 
of ac treatment combinations. There are b blocks and each treatment 
combination is randomly assigned to ac experimental units within 
each block. The mathematical model for this design is given by 


-1=1,2,...,b 
Vijk =~UtBetaytyetay)j tex YJ =1,2,...,4 
k=1,2,...,¢, 


? 9 


where yj; is the observed response corresponding to the i-th block, the 

j-th level of factor A, and the k-th level of factor C; —0o < pb < © 

is the overall mean, f; is the effect of the i-th block, a; is the effect 

of the j-th level of factor A, ), is the effect of the k-th level of factor 

C, (ay) jx 1s the interaction between the j-th level of factor A and the 

k-th level of factor C, and e;;, 1s the customary error term. 

(a) State the assumptions of the model if all the effects are considered 
to be fixed. 

(b) State the assumptions of the model if all the effects are considered 
to be random. 
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(c) State the assumptions of the model if the block effects are con- 
sidered to be random and A and C effects are considered to be 
fixed. 

(d) Report the analysis of variance table including expected mean 
squares under the assumptions of fixed, random, and mixed mod- 
els as stated in parts (a) through (c). 

(e) Assuming normality for the random effects in parts (a) through 
(c), develop tests of hypotheses for testing the effects correspond- 
ing to the block, factors A and C, and the A x C interaction. 

(f) Determine the estimators of the variance components based on 
the analysis of variance procedure under the assumptions of the 
random and mixed model. 

A psychological experiment was designed to study the effect of five 

learning devices. In order to control for directional bias on learning, 

a Latin square design was used with five subjects using five different 

orders. The test scores are given in the following where the letter within 

parentheses represents the learning device used. 


Order of Test 
Subject 1 2 3 4 5 


105 (D) 195 (C) 185 (A) 135 (E) 170 (B) 
165 (C) 185 (B) 150 (E) 190 (A) 150 (D) 
155 (A) 150 (D) 185 (C) 155 (B) 85 (E) 
165 (B) 195 (E) 135 (D) 110 (C) 105 (A) 
245 (E) 240 (A) 170 (B) 175 (D) 135 (C) 


ah WN = 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
test scores for all the devices are the same, and state which device 
you would recommend for use. Use a = 0.05. 

(d) Use Tukey’s multiple comparison method to determine whether 
there are significant differences among the top three learning 
devices. Use a = 0.01. 

(ce) Comment on the usefulness of the Latin square design in this 
case. 

(f) What is the efficiency of the design compared to the randomized 
block design? 

An experiment was designed to compare grain yields of five different 

varieties of corn. A 5 x 5 Latin square design was used to control for 

fertility gradients due to rows and columns. The data on yields were 
given as follows where the letter within parentheses represents the 
variety of the corn. 
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Column 


Row 1 2 3 4 5 


65.9(A) 66.0(D) 68.0(B) 674(E)  63.4(C) 
68.2(B) 67.6(E) 686(A) 686(C) 66.4(D) 
68.6(C) 68.3(B) 68.0(E) 67.2(D)  69.3(A) 
64.1(D) 62.9(A) 66.2(C) 67.0(B) 66.8(E) 
61.2(E) 62.7(C) 61.6(D) 67.8(A)  64.9(B) 


Oem wWN = 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
yields for all the varieties are the same, and state which variety 
you would recommend for planting. Use a = 0.05. 

(d) Use Tukey’s multiple comparison method to determine whether 
there are significant differences among the top three varieties. 
Use a = 0.01. 

(ec) Comment on the usefulness of the Latin square design in this 
case. 

({) What is the efficiency of the design compared to the randomized 
block design? 

Anderson and Bancroft (1952, p. 247) reported data from an experi- 

ment conducted at the University of Hawaii to compare six different 

legume intercycle crops for pineapples. A Latin square design was used 
and the data on yields in 10-gram units were given as follows where 
the letter within parentheses represents the variety of the legume. 


Column 


Row 1 2 3 4 5 6 


1 220 (B) 98 (F) 149 (D) 92 (A) 282 (E) 169 (C) 
2 74(A) 238(E) 158 (B) 228 (C) 48 (F) 188 (D) 
3 118(D) =. 279 (C) 118(F) 278(E) 176 (B) 65 (A) 
4 295(E) 222 (B) 54 (A) 104(D) =. 213 (C) 163 (F) 
5 187 (C) 90(D) 242(E) 96 (F) 66 (A) 122 (B) 
6 90 (F) 124 (A) 195 (C) 109 (B) 79 (D) 211 (E) 


Source: Anderson and Bancroft (1952, p. 247). Used with permission. 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
yields for all the varieties are the same. Use a = 0.05. 
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(d) Use Tukey’s multiple comparison method to make pairwise com- 
parisons among the three legumes with the highest yield. Use 
a = 0.01. 

(e) Comment on the usefulness of the Latin square design in this 
case. 

(f) What is the efficiency of the design compared to the randomized 
block design? 

Damon and Harvey (1987, p. 315) reported data from an experiment 

conducted by Scott Werme of the Department of Veterinary and Ani- 

mal Sciences at the University of Massachusetts to study the effect of 

the level of added okara in a ration of total digestible nutrient (TDN). 

A 4 x 4 Latin square design involving four treatments (levels of added 

okara), four sheep, and four periods was used. The data on the TDN 

levels (in percents) in the total ration are given as follows where the 

letter within parentheses represents the treatment. 
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Sheep 
Period 1 2 3 4 
1 61.60 (A) 75.83 (D) 68.17 (C) 65.61 (B) 
2 62.05 (B) 58.63 (A) 70.33 (D) 67.22 (C) 
3 66.91 (C) 68.10 (B) 58.43 (A) 71.98 (D) 
4 69.87 (D) 67.25 (C) 63.32 (B) 60.30 (A) 


Source: Damon and Harvey (1987, p. 315). Used with permission. 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
TDN levels in the total ration for all the treatments are the same. 
Use a = 0.05. 

(d) Use Tukey’s multiple comparison procedure to make pairwise 
comparisons among the three top treatments. Use a = 0.01. 

(e) Comment on the usefulness of the Latin square design in this 
case. 

(f) What is the efficiency of the design compared to the randomized 
block design? 

Fisher (1958, pp. 267-268) reported data on root weights for mangolds 

from five different treatments found by Mercer and Hall in 25 plots. 

The following table gives data in a Latin square layout where letters 

(A, B, C, D, E) representing five different treatments are distributed 

randomly in such a way that each appears once in each row and each 

column. 
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Column 
Row 1 2 3 4 5 
1 376 (D) 371 (E) 355 (C) 356 (B) 335 (A) 
2 316 (B) 338 (D) 336 (E) 356 (A) 332 (C) 
3 326 (C) 326 (A) 335 (B) 343 (D) 330 (E) 
4 317 (E) 343 (B) 330 (A) 327 (C) 336 (D) 
5 321 (A) 332 (C) 317 (D) 318 (E) 306 (B) 


Source: Fisher (1958, pp. 267-268). Used with permission. 


(a) Describe the mathematical model and the assumptions for the 

experiment. 

Analyze the data and report the analysis of variance table. 

Perform an appropriate F test for the hypothesis that the mean 

root weights for mangolds for all the treatments are the same. 

Use a = 0.05. 

Use Tukey’s multiple comparison procedure to make pairwise 

comparisons among the three treatments with the highest mean 

weight. Use a = 0.01. 

Comment on the usefulness of the Latin square design in this 

case. 

(f) What is the efficiency of the design compared to the randomized 
block design? 

Steel and Torrie (1980, p. 225) reported data from an experiment 

designed to study moisture content of turnip greens. A Latin square 

design involving five plants, five leaf sizes, and five treatments was 

used. Treatments were times of weighing since moisture losses might 

be anticipated in a 70°F laboratory as the experiment progressed. The 

data on moisture content (in percent) are given as follows where the 

letter within parentheses represents a treatment. 


(b) 
(Cc) 


(d) 


(e) 


Leaf size (1 = smallest, 5 = largest) 


Plant 1 2 3 4 5 


1 86.67(E) 87.15(D)  88.29(A) 88.95 (C) 89.62 (B) 
2 85.40 (B) 84.77(E) 85.40(D) = 87.54(A) 86.93 (C) 
3 87.32 (C) 88.53 (B) 88.50(E) 89.99(D) 89.68 (A) 
4 84.92 (A) 85.00(C) 87.29(B) 87.85(E) 87.08 (D) 
5 84.88(D) 86.16(A) 87.83 (C) 85.83 (B) 88.51 (E) 


Source: Steel and Torrie (1980, p. 225). Used with permission. 


(a) Describe the mathematical model and the assumptions for the 


experiment. 
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(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
moisture content for all the treatments are the same. Use a 
0.05. 

(d) Use Tukey’s multiple comparison procedure to make pairwise 
comparisons among the three treatments with the highest mois- 
ture content. Use a = 0.01. 

(ce) Comment on the usefulness of the Latin square design in this 
case. 

(f) What is the efficiency of the design compared to the randomized 
block design? 
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An experiment was designed to study the effect of fertilizers on yields 
of wheat. A 4 x 4 Graeco-Latin square design involving four fertil- 
izers, four varieties of wheat, four rows, and four columns was used. 
The data are given as follows where the Roman letter within parenthe- 
ses represents the fertilizer and the Greek letter represents the wheat 


variety. 


Column 
Row 1 2 3 4 
1 135.4 (CB) 124.4(By) 114.6 (D8) 135.4 (Aq) 
2 114.2 (Ba) 105.2 (Cd) 113.6 (Ay) 124.4 (DB) 
3 114.0 (Ad) 116.1 (Da) 134.2 (BB) 119.6 (Cy) 
4 114.9 (Dy) 164.4 (AB) 134.5 (Ca) 118.9 (BS) 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Perform an appropriate F test for the hypothesis that the mean 
yields for all the fertilizers are the same, and state which fertilizer 
you would recommend for use. Use a = 0.05. 

(d) Perform an appropriate test for the hypothesis that the mean 
yields for all the varieties are the same, and state which variety 
you would recommend for use. Use a = 0.05. 

(e) Comment on the usefulness of the Graeco-Latin square design 
in this case. 


An experiment was designed to study the effect of diet on cholesterol. 
A Graeco- Latin square design involving five diets, five time periods, 
five technicians, and five laboratories was used. Subjects were fed the 
diets for different time periods and cholesterol was measured. The 
data are given as follows where the Roman letter within parentheses 


represents the diet and the Greek letter represents the period. 
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Laboratory 
Technician 1 2 3 4 5 
1 175.5 (Aq) 165.5 (BB) 168.8 (Cy) 155.5 (Dd) 162.2 (Ee) 
2 157.7 (By) 170.0 (Cd) 167.7 (De) 160.0 (Ea) 170.0 (AB) 
3 170.0 (Ce) 161.5 (Da) 165.5(EB) 174.4(Ay) 162.2 (Bd) 
4 154.4 (DB) 164.4 (Ey) 171.1 (Ad) 163.3 (Be) 166.6 (Ca) 
5 160.0 (ES) 173.3 (Ay) 166.6 (Ba) 166.6 (CB) 163.3 (Dy) 


Describe the mathematical model and the assumptions for the 
experiment. 

Analyze the data and report the analysis of variance table. 
Perform an appropriate F test for the hypothesis that the mean 
cholesterol levels for all the diets are the same. Use a = 0.05. 
Comment on the usefulness of the Graeco-Latin square design 
in this case. 

An experiment was designed to compare the efficiencies of four dif- 
ferent operators using three different machines. A split-plot design 
was used where the output of one machine constitutes a whole-plot, 
which was then divided into four subplots for the four operators. 
The experiment was repeated four times and the data given as 
follows. 


(a) 


(b) 
(c) 


(d) 


Machine 1 Machine 2 Machine 3 


Operator Operator Operator 


1 2 3 4 1 2 3 4 1 2 3 4 


161.0 
160.7 
156.5 
150.5 


143.4 
136.5 
135.3 
142.2 


161.0 
165.6 
160.6 
154.5 


135.1 
140.5 
142.5 
146.0 


167.5 
160.8 
158.7 
158.5 


138.0 
142.7 
132.6 
147.7 


Describe the mathematical model and the assumptions for the 
experiment. 

Analyze the data and report the analysis of variance table. 

Test whether there are significant differences in output of the 
three machines. Use a = 0.05. 

Test whether there are significant differences in output of the 
four operators. Use a = 0.05. 

Is there a significant interaction effect between machines and 
operators? Use a = 0.05. 

Comment on the usefulness of the split-plot design in this case. 


(f) 
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26. Consider a split-plot design involving six treatments A;, A2,..-., A6 


27. 


that are assigned at random to six whole-plots. Each whole-plot is then 
divided into two subplots for testing treatments B, and B2 involving 
three replications. The data on yields are given as follows. 


111) 107) 110) «61130 «102-0 127) 112 123) 125) 120-131-130 
107) 125 #117) 11506 «110 «©1119 «©1210 129) 127) 126-127-123 
118 123 109 #119 116 117) 113° 117) «©1230: 123)0« «122 ~—=«(116 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are significant differences in yields of the 
whole-plot treatments. Use a = 0.05. 

(d) Test whether there are significant differences in yields of the 
subplot treatments. Use a = 0.05. 

(e) Is there a significant interaction effect between whole-plot and 
subplot treatments? Use a = 0.05. 

(f) Comment on the usefulness of the split-plot design in this case. 

John (1971, p. 98) reported data from an experiment described by 

Yates (1937b) carried out at the Rothamsted Experimental Station, 

Harpenden, England, with two factors, varieties of oats and quantity 

of manure. A split-plot design was used with the variety as the whole- 

plot treatment and the manure as the subplot treatment. There were six 

blocks of three plots each and each plot was divided into four subplots. 

One plot in each block was planted with each of the three varieties of 

oats and each subplot was assigned at random to one of the four levels 

of manure. The data on yields are given as follows. 


Block 
1 2 3 
Manure Variety Variety Variety 


Level 
(Tons/Acre) A; Ag Az Ay Az Az Ay Az Az 


Nomanure 111 117 ~~ 105 74 64 70 61 70 96 
0.01 130 114 140 89 103 89 91 108 124 
0.02 157 161 118 81 132 104 97 126 121 
0.03 174 141 156 122 133 117 «100 149 144 
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Block 
4 5 6 
Manure Variety Variety Variety 


Level 
(Tons/ Acre) Aj A» Ay Ay A? Ay Ay Ay A3 
No manure 62 80 63 68 60 89 53 89 97 


0.01 90 82 70 64 102 129 74 82 99 
0.02 100 94 109 112 89 132 §=118 86 119 
0.03 116 =: 126 99 86 96 124 86113 104121 


Source: John (1971, p. 99). Used with permission. 


(a) Describe the mathematical model and the assumptions for the 
experiment. 

(b) Analyze the data and report the analysis of variance table. 

(c) Test whether there are significant differences in yields of the six 
blocks. Use a = 0.05. 

(d) Test whether there are significant differences in yields of the 
three varieties. Use a = 0.05. 

(e) Test whether there are significant differences in yields of the four 
manures. Use a = 0.05. 

(f) Is there a significant interaction effect between varieties and 
manures? Use a = 0.05. 

(z) Comment on the usefulness of the split-plot design in this case. 


1 Analysis of Variance 
Using Statistical 
Computing Packages 


11.0 PREVIEW 


The widespread availability of modern high speed mainframes and micro- 
computers and myriad accompanying software have made it much simpler to 
perform a wide range of statistical analyses. The use of statistical computing 
packages or software can make it possible even for a relatively inexperienced 
person to utilize computers to perform a statistical analysis. Although there 
are numerous statistical packages that can perform the analysis of variance, 
we have chosen to include for this volume three statistical packages that are 
most widely used by scientists and researchers throughout the world and that 
have become standards in the field.! The packages are the Statistical Analysis 
System (SAS), the Statistical Product and Service Solutions (SPSS),* and the 
Biomedical Programs (BMDP). In the following we provide a brief introduc- 
tion to these packages and their use for performing an analysis of variance, and 
related statistical tests of significance.?:4° 


11.1 ANALYSIS OF VARIANCE USING SAS 


SAS is an integrated system of software products for data management, report 
writing and graphics, business forecasting and decision support, applications 


! A recent survey of faculty regarding their commercial software preference for advanced analysis 
of variance courses found that the most frequently used packages were SAS, SPSS, and BMDP 
(Tabachnick and Fidell (1991)). 

2 The acronym SPSS initially originated from Statistical Package for the Social Sciences. However, 
the term Statistical Product and Solutions really applies to the SPSS acronym as used as the 
company name. The package is simply known as SPSS. 

3 For a discussion of computational algorithms and the construction of computer programs for the 
analysis of variance as used in statistical packages for the analysis of designed experiments, see 
Heiberger (1989). 

4 For a listing of a series of analysis of variance and related programs that can be run on micro- 
computer systems, see Wolach (1983). 

> For some further discussions of the use of SAS, SPSS, and BMDP software in performing 
analysis of variance including numerical examples illustrating various designs, see Colleyer and 
Enns (1987). 
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research, and project management. The statistical analysis procedures in the 
SAS system are among the finest available. They range from simple descrip- 
tive statistics to complex multivariate techniques. One of the advantages of SAS 
software is the variety of procedures — from elementary analysis to the most 
sophisticated statistical procedures, of which GLM (General Linear Models) is 
the flagship — that can be performed. SAS software provides tremendous flex- 
ibility and is currently available on thousands of computer facilities throughout 
the world. 

For a discussion of instructions for creating SAS files and running SAS pro- 
cedures, the reader is referred to the SAS manuals and other publications docu- 
menting SAS procedures. The SAS manuals, SAS Language and Procedures: 
Usage (SAS Institute, 1989, 1991), SAS Language: Reference (SAS Institute, 
1990a), SAS Procedures Guide (SAS Institute, 1990b), and SAS/STAT User’s 
Guide (SAS Institute, 1990c), together provide exhaustive coverage of the SAS 
software. The reference and procedure manuals provide an introduction to data 
handling and data management procedures including some descriptive statis- 
tics procedures, and the STAT manual covers inferential statistical procedures. 
The SAS Introductory Guide for PCs (SAS Institute, 1992) is a very useful 
publication for the beginner which provides an elementary discussion of some 
basic and commonly used data management and statistical procedures, includ- 
ing several simple ANOVA designs. Some other related publications providing 
easy-to-use instructions for running SAS procedures together with a broad cov- 
erage of statistical techniques and interpretation of SAS data analysis include 
Dilorio (1991), Friendly (1991), Freund and Littell (1991), Littell et al. (1991), 
Miron (1993), Spector (1993), Aster (1994), Burch and King (1994), Hatcher 
and Stepanski (1994), Herzberg (1994), Jaffe (1994), Elliott (1995), Everitt 
and Derr (1996), Dilorio and Hardy (1996), Cody and Smith (1997), and 
Schlotzhauer and Littell (1997). 

There are several SAS procedures for performing an analysis of variance. 
PROC ANOVA is a very useful procedure that can be used for analyzing a 
wide variety of anova designs known as balanced designs that contain an equal 
number of observations in each submost subcell. However, this procedure is 
somewhat limited in scope and cannot be used for the unbalanced anova designs 
that contain an unequal number of observations in each submost subcell. More- 
over, in a multifactorial experiment, PROC ANOVA is appropriate when all the 
effects are fixed. PROC GLM, which stands for General Linear Models, is more 
general and can accommodate both balanced and unbalanced designs. How- 
ever, the procedure is more complicated, requiring more memory and execution 
time; it runs much slower and 1s considerably more expensive to use than PROC 
ANOVA. The other two SAS procedures which are more appropriate for anova 
models involving random effects include NESTED and VARCOMP. PROC 
NESTED is specially configured for anova designs where all factors are hier- 
archically nested and involve only random effects. The NESTED procedure 1s 
computationally more efficient than GLM for nested designs. Although PROC 
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NESTED is written for a completely random effects model, the computations of 
the sums of squares and mean squares are the same for all the models. If some of 
the factors are crossed or any factor is fixed, PROC ANOVA is more appropriate 
for balanced data involving only fixed effects and GLM for balanced or unbal- 
anced data involving random effects. PROC VARCOMP 1s especially designed 
for estimating variance components and currently implements four methods of 
variance components estimation. In addition, SAS has recently introduced a 
new procedure, PROC MIXED, which fits a variety of mixed linear models 
and produces appropriate statistics to enable one to make statistical inference 
about the data. Traditional mixed linear models contain both fixed and random 
effects parameters, and PROC MIXED fits not only the traditional variance 
components models but models containing other covariance structures as well. 
PROC MIXED can be considered as a generalization of the GLM procedure in 
the sense that although PROC GLM fits standard linear models, PROC MIXED 
fits a wider class of mixed linear models. PROC GLM produces all Types I to 
IV tests of fixed effects, but PROC MIXED computes only Type I and Type III. 
Instead of producing traditional analysis of variance estimates, PROC MIXED 
computes REML and ML estimates and optionally computes MIVQUE (0) 
estimates which are similar to analysis of variance estimates. PROC MIXED 
subsumes the VARCOMP procedure except that it does not include the Type I 
method of estimating variance components. For further information and appli- 
cations of PROC MIXED to random and mixed linear models, see Littel et al. 
(1996) and SAS Institute (1997). 


Remark: The output from GLM produces four different kinds of sums of squares, 
labelled as Types I, II, III, and IV, which can be thought of, respectively, as “sequential,” 
“each-after-all-others,” ‘‘X-restrictions” and “hypotheses.” The Type I sum of squares 
of an effect is calculated in a hierarchical or sequential manner by adjusting each term 
only for the terms that precede it in the model. The Type II sum of squares of an effect 
is calculated by adjusting each term by all other terms that do not contain the effect in 
question. If the model contains only the main effects, then each effect is adjusted for 
every other terms in the model. The Type III sum of squares of an effect is calculated 
by adjusting for any other terms that do not contain it and are orthogonal to any effects 
that contain it. The Type IV sum of squares of an effect is designed for situations in 
which there are empty cells and for any effect in the model, if it is not contained in 
any other term, Type IV = Type III = Type II. For balanced designs, all types of sums 
of squares are identical; they add up to the total sum of squares and form an unique 
and orthogonal decomposition. For unbalanced designs with no interaction models, 
Types II, II, and IV sums of squares are the same. Finally, for unbalanced designs 
with no empty cells, Types III and IV are equivalent. Furthermore, output from GLM 
includes a special form of estimable functions for each of the sums of squares listed 
under Types I, II, II], and IV. The importance of the estimable functions is that each 
provides a basis for formulating the associated hypothesis for the corresponding sum 
of squares. For further information on sums of squares from GLM, see Searle (1987, 
Section 12.2). 
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The following is an example of SAS commands necessary to run the ANOVA 
procedure: 


PROC ANOVA; 
CLASS {list of factors}; 
MODEL {dependent variable(s)} = {list of effects}; 


The class statement contains the keyword CLASS followed by a list of factors or 
variables of classification. The model statement contains the keyword MODEL 
followed by the list of dependent variable(s), followed by the equality symbol 
(=) which in turn is followed by the list of factors representing main effects 
as well as interaction effects. Interaction effects between different factors are 
designated by the factor names connected by an asterisk (*). The same com- 
mands are required for executing other analysis of variance procedures as well. 
Let Y designate the dependent variable and consider a factorial experiment in- 
volving three factors A, B, and C. The following is a typical model statement 
containing all the main and two factor interaction effects 


MODEL Y =A B C A*B A*C B*C; 


To analyze a factorial experiment with the full factorial model, a simpler way to 
specify a model statement is to list all the factors separated by a vertical slash. 
For example, the full factorial model involving factors A, B, C, and D can be 
written as 


MODEL ¥ = A| B|C|D: 


In terms of the choice of a fixed, random, or mixed effects model, the pro- 
cedures require that the user supply information about which factors are to be 
considered fixed and which random. In some cases, if appropriate specifica- 
tions for fixed and random effects are not provided, the analysis of variance 
tests provided by the procedure may differ from what the user wants them to 
be. In GLM, one can designate a factor as random by an additional statement 
containing the keyword RANDOM followed by the names of all effects, includ- 
ing interactions, which are to be treated as random. When some of the factors 
are designated as RANDOM, the GLM procedure will perform a mixed effects 
analysis including a default analysis by treating all factors as fixed. For ex- 
ample, given a mixed effects model with A fixed and B random, the following 
commands may be used to execute a GLM program 


PROC GLM; 

CLASS A B; 

MODEL Y =A B A*B,; 
RANDOM B A*B; 
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The program will produce expected mean squares using “alternate” mixed 
model analysis, but will use the error mean square for testing the hypothe- 
ses for both the fixed factor A and the random factor B. If the analyst wants to 
use an alternate test, he has an option to designate the hypothesis to be tested 
and the appropriate error term by employing the following command: 


TEST H = {effect to be tested} & = {error term}; 


For example, if the analyst wants to use the interaction mean square for the 
error term for testing the hypothesis for the random factor B, the following 
commands may be used: 


PROC GLM; 
CLASS A B; 
MODEL Y=A B A*B; 
TEST H = B E = A*B; 


If several different hypotheses are to be tested, multiple TEST statements can be 
used. It should be noted that the TEST statement does not override the default 
option involving the test of a hypothesis against the error term which is always 
performed in addition to the hypothesis test implied by the TEST statement. 


Remarks: (i) Given the specifications of the random and fixed effects via the RANDOM 
option, the GLM will print out the coefficients of the expected mean squares. The 
researcher can then determine how to make any test using expected mean squares. When 
exact tests do not exist, PROC GLM can still be used followed by the Satterthwaite’s 
procedure. 

(ii) The expected means squares and tests of significance are computed based on 
‘alternate’ mixed model theory without the assumption that fixed-by-random interactions 
sum to zero across levels of the fixed factor. Thus, the tests based on RANDOM option 
will differ from those based on ‘standard’ mixed model theory. 

(iii) The RANDOM statement provides for the following two options which can be 
placed at the end of the statement following a slash (/): 

Q: This option provides a complete listing of all quadratic forms in the fixed effects 
that appear in the expected mean square. 

TEST: This option provides that an F test be performed for each effect specified in 
the model using an appropriate error term as determined by the expected mean squares 
under the assumptions of the random or mixed model. When more than one mean square 
must be combined to determine the appropriate error term, a pseudo-F test based on 
Satterthwaite’s approximation is performed. 

(iv) For balanced designs, the ANOVA procedure can be used instead of GLM. 
However, the ANOVA performs only the fixed effects analysis and does not allow the 
option of the RANDOM statement. 


For using the NESTED procedure, no specifications for MODEL or RAN- 
DOM statements are required since the procedure is designed for hierarchically 
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nested designs involving only random factors. The order of nesting of the factors 
is indicated by the CLASS statement. In addition, a VAR statement is needed 
to specify the dependent variable. The following commands will execute the 
NESTED procedure where the factor B is nested within factor A: 


PROC NESTED; 
CLASS A B; 
VAR Y; 


The program involving the preceding statements will perform two tests of hy- 
potheses; one for the factor A effects against the mean square for B and the 
other for the factor B effects against the error mean square. The GLM procedure 
can also be used for analyzing designs involving nested factors. The nesting 
is indicated in the MODEL statement by the use of parentheses containing the 
factor within which the factor preceding the left parenthesis 1s being nested. 
For example, to indicate that the factor B is nested within the factor A, the 
following MODEL statement could be used: 


MODEL Y = A B(A); 


If the factor B is random, one could perform the appropriate F test by using 
the RANDOM statement in the following SAS commands: 


PROC GLM; 

CLASS A B; 

MODEL Y = A B(A); 
RANDOM B(A)/ TEST; 


Alternatively, one could replace the preceding RANDOM statement by the 
following TEST option: 


TEST H=A E = B(A),; | 
Multiple nesting involving several factors, say, A, B,C, and D, where D is 
nested within C, C within B, and B within A, is indicated by the following 
MODEL statement: 
MODEL Y = A B(A) C(BA) D(CBA),; 


If factors C and D are crossed rather than nested, the above MODEL statement 
will be modified as: 


MODEL Y = AB(A) C*D(BA); 
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For the problems involving estimation of variance components, PROC VAR- 
COMP is a better choice over other analysis of variance procedures. It produces 
estimates of variance components assuming all factors are random. For per- 
forming a mixed model analysis, the user must indicate the factors that are to be 
treated as random. All the SAS commands involving the CLASS and MODEL 
statements are used the same way as in other procedures. For example, the fol- 
lowing commands can be used to estimate the variance components associated 
with the random factors A, B, and C and their interaction effects. 


PROC VARCOMP; 
CLASS A BC; 
MODEL Y = A| B|C; 


It should be pointed out that the VARCOMP does not perform F tests. However, 
it does produce sums of squares and mean squares that can be readily used to 
perform the required F tests manually. 

PROC MIXED employs the same specifications for the CLASS statement 
as GLM but differs in the specifications of the MODEL statement. The right 
side of the MODEL statement now contains only the fixed-effect factors. The 
random-effect factors do not appear in the MODEL statement instead they 
are listed under the RANDOM statement. Thus, the MODEL and RANDOM 
statements are core essential statements 1n the application of the PROC MIXED 
procedure. Given two factors A and B with A fixed and B random, the following 
commands can be used to run the procedure: 


PROC MIXED; 
CLASS A B; 
MODEL Y = A; 
RANDOM B A*B; 


The program will perform significance tests for fixed effects (in this case for fac- 
tor A) and will compute REML estimates of the variance components including 
the error variance component. 


Remark: PROC MIXED computes variance components estimates for random effect 
factors listed in the RANDOM statement including the error variance component; and 
performs significance tests for fixed effects specified in the MODEL statement. By 
default, the significance tests are based on likelihood principle and are equivalent to the 
conventional F tests for balanced data (default option). Variance components estimates 
are computed using the restricted maximum likelihood (REML) procedure (default 
option). For balanced data sets with nonnegative estimates, the REML estimates are 
identical to the traditional anova estimates. However, for designs with unbalanced 
structure, REML estimates generally differ from the anova estimates. Furthermore, 
unlike the ANOVA and GLM procedures, PROC MIXED does not directly compute or 
print sums of squares. Instead, it shows REML estimates of variance components and 
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prints a separate table of tests of fixed effects that contains results of significance tests 
for the fixed-effect factors specified in the MODEL statement. 


11.2 ANALYSIS OF VARIANCE USING SPSS 


Statistical Package for the Social Sciences was initially developed by N. H. Nie 
and his coworkers at the National Opinion Research Center at the University of 
Chicago. It was officially shortened to SPSS when the first version of SPSS* 
was released in 1983. In Release 4.0 of the new product it was simply known 
as SPSS (the X was dropped). Current versions are SPSS for Windows, the 
most recent of which is now Release 9.0. This package 1s an integrated system 
of computer programs originally developed for the analysis of social sciences 
data. The package provides great flexibility in data formation, data transfor- 
mation, and manipulation of files. Some of the procedures currently available 
include descriptive analysis, simple and partial correlations, one-way and mul- 
tiway analysis of variance, linear and nonlinear regression, loglinear analysis, 
reliability and life tables, and a variety of multivariate methods. 

For a discussion of instructions for preparing SPSS files and running a SPSS 
procedure, the reader is referred to the SPSS manuals and other publications 
documenting SPSS procedures. For applications involving SPSS for Windows, 
SPSS Base 7.5 for Windows User’s Guide (SPSS Inc., 1997a) provides the 
most comprehensive and complete coverage of data management, graphics, 
and basic statistical procedures. The other two manuals: SPSS Professional 
Statistics 7.5 (SPSS Inc., 1997b) and SPSS Advanced Statistics (SPSS Inc., 
1997c) are also very useful publications for a broad coverage and documentation 
of intermediate and advanced level statistical procedures and interpretation 
of SPSS data analysis. Some other related publications providing easy-to-use 
instructions for running SPSS programs include Crisler (1991), Hedderson 
(1991), Frude (1993), Hedderson and Fisher (1993), Lurigio et al. (1995), and 
Coakes and Steed (1997). The monograph by Levine (1991) is a very useful 
guide to SPSS for performing analysis of variance and other related procedures. 

There are several SPSS procedures for performing analysis of variance. The 
simplest procedure is ONEWAY for performing one-way analysis of variance.°® 
The other programs will also do the one-way analysis of variance, but ONEWAY 
has a number of attractive features which make it a very useful procedure. In 
addition to providing the standard analysis of variance table, the ONEWAY will 
produce the following statistics for each group: number of cases, minimum, 
maximum, mean, standard deviation, and standard error, and 95 percent con- 
fidence interval for the mean. The ONEWAY also provides for testing linear, 
quadratic and polynomial relations across means of ordered groups as well as 
user specified a prior contrasts (coefficient for means) to test for specific linear 
relations among the means (e.g., comparing means of two treatment groups 


® The MEANS procedure is even more basic than ONEWAY, although it does require few extra 
mouse clicks or a subcommand keyword to obtain analysis of variance results. 
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against that of a control group). Additional features include a number of multi- 
ple comparison tests including more than a dozen methods for testing pairwise 
differences in means and 10 multiple range tests for identifying subsets of means 
that are not different from each other. 

Three other SPSS procedures that are of common use for analyzing higher 
level designs include ANOVA, MANOVA, and GLM. The ANOVA is used for 
performing analysis of variance for a factorial design involving two or more 
factors. For a model with five or fewer factors, the default ANOVA option 
provides for a full factorial analysis including all the interaction terms up to 
order five. The user has the option to control the number of interaction terms to 
be included in the model and any interaction effects that are not computed are 
pooled into the residual or error sum of squares. However, the user control over 
interactions in ANOVA is limited to specifying a maximum order of interactions 
to include. This means that all interactions at and below that level are included. 
For example, in a design with factors A, B, and C, the model that ANOVA will 
fit (assuming no empty cells) with all three factors used are: main effects only, 
main effects and all three two-way interactions and the full factorial model. The 
user does not have an option of fitting a model such as A, B, C, A*B where 
some but not all of the interactions of a particular model are included. 

The MANOVA and GLM are probably the most versatile and complex of 
all the SPSS procedures and can accommodate both balanced and unbalanced 
designs including nested or nonfactorial designs, multivariate data and analyses 
involving random and mixed effects models. Both procedures are based on a 
general linear model program and can allow multiple design subcommands. 
However, the GLM only honors the last DESIGN subcommand it encounters. 
The main difference between the two procedures in terms of statistical design 1s 
that while the MANOVA uses the full-rank reparametrization, the GLM uses the 
generalized inverse approach to accommodate a non-full rank overparametrized 
model. In SPSS 7.5 version, the MANOVA is available only through syntax 
commands, while the GLM is available both in syntax and via dialog boxes. 
In addition, the GLM offers a variety of features unavailable in MANOVA 
(see, e.g., SPSS Inc., 1997c, pp. 345-346). For example, the GLM tests for 
univariate homogeneity of variance assumption using the Levene test and pro- 
vides for a number of different multiple comparison tests for unadjusted one- 
way factor means while these options are unavailable in MANOVA. 


Remark: In Release 8.0, ANOVA is available via command syntax only; and in its 
place a new procedure UNIANOVA for performing univariate analysis of variance has 
been introduced. UNIANOVA is simply a univariate version of the GLM procedure 
restricted primarily for handling designs with one dependent variable. 


In aSPSS analysis of variance procedure, all the dependent and independent 
variables are specified by using the keyword denoting the name of the procedure 
followed by the listing of the dependent variable first which is separated from 
the independent variables or factors using the keyword BY. The levels of a factor 
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are specified by the use of parentheses that give the minimum and maximum 
numeric value of the levels of the given factor. For example, if the factor A has 
three levels coded as 1, 2, and 3 and B has four levels coded as 1, 2, 3, and 
4, and Y denotes the dependent variable then a one-factor analysis of variance 
using ONEWAY and a full factorial analysis using ANOVA and MANOVA 
procedures can be performed using the following commands: 


ONEWAY Y BY A(1, 3) 
ANOVA Y BY A(I, 3) B(1, 4) 
MANOVA ¥ BY A(1, 3) BC, 4). 


Analysis of variance involving more than two factors can similarly be performed 
using either ANOVA or MANOVA procedures. For example, with three factors 
A, B, and C, each with three levels, a full factorial analysis can be performed 
using the following statements: 


ANOVA Y BY A(1, 3) BC, 3) C(I, 3) 
MANOVA Y BY A(1, 3) B(1, 3) Cd, 3). 


For four factors A, B, C, and D, with A having 2 levels, B 3 levels, C 4 levels, 
and D 5 levels, the statements are: 


ANOVA ¥ BY A(1, 2) B(, 3) CC, 4) DA, 5) 
MANOVA Y BY A(1, 2) B(J, 3) C1, 4) DC, 5). 


The syntax for GLM does not require the use of code levels for the factors 
appearing in the model and is simply written as: 


GLM Y BY AB 
GLM Y BY ABC 
GLM Y BY ABCD, 
etc. 


Remark: In Release 7.5 (and 8.0), ONEWAY does not require range specifications and 
does not honor them if they are included. 


The preceding commands for ANOVA, MANOVA, and GLM procedures 
without any further qualifications will assume a full factorial model. In 
MANOVA and GLM, a factorial model could be made more explicit by an 
additional statement separated from the MANOVA/GLM statement by a slash 
(/), and consisting of the keyword DESIGN followed by the symbol “=” which 
in turn is followed by a listing of all the factors including interactions. The 
interaction A x B between the two factors A and B is indicated by the use of 
the keyword BY connecting the two factors (i.e., A BY B). In GLM, one can 
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either use the keyword BY or the asterisk (*) to join the factors involved in the 
interaction term. For example, in the MANOVA/GLM command given previ- 
ously, the user can explicitly indicate the full factorial model by the following 
commands: 


MANOVA Y BY A(1, 2) BC, 3) 
/DESIGN = A, B,A BY B. 
GLM Y BY AB 

/DESIGN = A, B, A*B. 


If the model is not a full factorial, the DESIGN statement simply lists the effects 
to be estimated and tested. For example, with three factors A, B, and C, amodel 
containing main effects and the interactions A x B and B x C can be specified 
by the following MANOVA commands: 


MANOVA Y BY A(1, 2) BC, 3) CC, 4) 
/DESIGN = A,B,C, ABY B, BBY C. 


ANOVA does not have an option for a design subcommand, and models other 
than the full factorial are specified by using the MAXORDERS subcommand 
to suppress interactions below a certain level. 

MANOVA and GLM procedures also allow the use of nested models for 
nesting one effect within another. In MANOVA, nested models are indicated 
by connecting the two factors being nested by the keyword WITHIN (or just 
W). For example, for a two-factor nested design with factor B nested within 
factor A and the response units nested within factor B, the following commands 
are used for the MANOVA procedure: 


MANOVA Y BY A(1, 3) BCI, 3) 
/DESIGN A, B WITHIN A. 


In GLM, one can either use the keyword WITHIN or a pair of parentheses to 
indicate the desired nesting. Thus, the nesting in the above example is indicated 
by 


GLM Y BY AB 
/DESIGN A, B(A). 


Multiple nesting involving several factors 1s specified by the repeated use of the 
keyword WITHIN. In GLM, one can also use more than one pair of parentheses 
where each pair must be enclosed or nested within another pair. For example, to 
specify a three-factor completely nested design where B is nested within A and 
C is nested within B, the following commands are used for the MANOVA/GLM 
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procedures: 


MANOVA Y BY A(1, 3) B(1, 4) CC, 4) 

/DESIGN A, B WITHIN A, C WITHIN B WITHIN A. 
GLM Y BY ABC 

(DESIGN A, B(A), C(B(A)). 


Remarks: (i) The DESIGN subcommand in MANOVA cannot be simply written as 
/DESIGN B WITHIN 4; otherwise it would produce nonsensical sums of squares with 
unbalanced data and an incorrect term and degrees of freedom in balanced or unbalanced 
designs. MANOVA’s reparameterization method in general requires that hierarchies be 
maintained when listing effects on the DESIGN subcommand. That is, /DESIGN B 
WITHIN A without A generally makes no sense, and /DESIGN A*B without both A 
and B in the model also makes no sense. Neither term estimates what the user wants it 
to estimate unless hierarchy is maintained. 

(ii) In GLM, the subcommand /DESIGN B(A) will fit the same model as /DESIGN 
A, B, A*B, or /DESIGN A, B(A), or /DESIGN B, A(B), or /DESIGN A, A*B, or 
/DESIGN B, A*B, or /DESIGN A*B, but the interpretation of the parameter estimates 
will differ in various cases. GLM doesn’t reparametrize, so if one leaves out contained 
effects while including containing ones, some of the parameters for the containing 
effects that would normally be redundant no longer are, and the degrees of freedom and 
interpretation of the fitted effects are altered. In order to fit the standard nested model, 
the subcommand should be specified as /DESIGN A, B(A). 


For the analysis of variance models containing random factors, the MANOVA 
or GLM procedures must be used. In MANOVA, special F tests involving a 
random and mixed model analysis are performed by specifying denominator 
mean squares via the use of the keyword VS within the design statement. All the 
denominator mean squares, other than the error mean square (which is referred 
to as WITHIN), must be named by a numeric code from 1 to 10. One can then 
test against the number assigned to the error term. It is generally convenient to 
test against the error term before defining it as long as it is defined on the same 
design subcommand. For example, in a factorial analysis involving the random 
factors A and B where the main effects for the factors A and B are to be tested 
against the A x B mean square and the A x B effects are to be tested against 
the error mean square, the following commands are required: 


MANOVA Y BY A(1, 2) BCI, 3) 
/DESIGN = A VS 1 

BVS 1 

A BY B = 1 VS WITHIN. 


In the design statement given above, the first two lines specify that factors A 
and B are to be tested against the error term 1. The third line specifies that the 
error term | is defined as the A x B interaction, which in turn is to be tested 
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against the error mean square (defined by the keyword WITHIN). Suppose, in 
the preceding, example, A is fixed, B is a random factor, and A 1s to be tested 
against A x B interaction as error term 1 and B is to be tested against the usual 
error mean square. To perform such a mixed model analysis, we would use the 
following command sequence: 


MANOVA Y BY A(1, 2) B(J, 3) 
/DESIGN = A VS 1 
A BY B = 1 VS WITHIN 
B VS WITHIN. 


Similarly, consider a two-factor nested design where a random factor B is nested 
within a fixed factor A. Now, the following commands will execute appropriate 
tests of the effects due to A and B(A) factors: 


MANOVA Y BY A(1, 2) B(, 3) 
/DESIGN = A VS 1 
B WITHIN A = 1 VS WITHIN. 


The first line in the design statement specifies that the factor A effect is to be 
tested against the error term 1. The second line specifies the error term 1 as 
the mean square due to the B(A) effect, and further specifies that this effect is 
to be tested against the usual error mean square (designated by the keyword 
WITHIN). For a three-factor completely nested design where a random factor 
C is nested within a random factor B which in turn is nested within a random 
factor A, the following commands are required to perform appropriate tests of 
the effects due to A, B(A), and C(B): 


MANOVA Y BY A(1, 2) BC, 3) CC, 4) 
/DESIGN = A VS 1 
B WITHIN A = 1 VS 2 
C WITHIN B WITHIN A = 2 VS WITHIN. 


In the above example, A is tested against the error term number 1 which is de- 
fined as B WITHIN A mean square; B is tested against the error term number 2 
which is defined as the C WITHIN B WITHIN A mean square; and C is tested 
against the usual error mean square. 


Remarks: (i) The default analysis without VS specification assumes that all factors 
are fixed. The use of the keyword VS within the design statement is the main device 
available for tailoring of the analysis involving random and mixed effects models. 

(ii) The assignment of error terms via VS specification are appropriate only for bal- 
anced designs. For unbalanced designs, psuedo- F tests based on Satterthwaite procedure 
will generally be required. 
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In GLM, the random or mixed effects analysis of variance is performed by 
a subcommand containing the keyword RANDOM followed by names of all 
the factors which are to be treated as random. If a factor A is specified as a 
random effect, then all the two-factor and higher order interaction effects con- 
taining the specified effect are automatically treated as random effects. When 
the RANDOM subcommand is used, the appropriate error terms for testing 
hypotheses concerning all the effects in the model are determined automati- 
cally. When more than one mean squares must be combined to determine the 
appropriate error term, a pseudo-F test based on the Satterthwaite procedure 
is performed. Several random effects can be specified on a single RANDOM 
subcommand; or one may use more than one RANDOM subcommand which 
have an accumulated effect. For example, to perform a two-factor random ef- 
fects analysis of variance involving factors A and B, the following statements 
are required: 


GLM Y BY AB 
/DESIGN A, B, A*B 
/RANDOM A, B. 


In the above example, the hypothesis testing for each effect will be automatically 
carried out against the appropriate error term. Thus, the main effects A and B 
are tested against the A x B interaction which in turn 1s tested against the usual 
error term. Suppose, in the example above, that A is fixed and B is random. 
To perform a mixed model analysis, we would use the following sequence of 
statements: 


GLM Y BY AB 
/DESIGN A, B, A*B 
/RANDOM B. 


In the above example, the effects B and A x B are treated as random effects. A 
and B are tested against A x B interaction while A* B 1s tested against the usual 
error term. In addition, GLM also allows the option of a user specified error term 
to test ahypothesis via the use of asubcommand TEST. To use this subcommand, 
the user must specify both the hypothesis term and the error term separated by 
the keyword VS. The hypothesis term must be a valid effect specified or implied 
in the DESIGN subcommand and must precede the keyword VS. The error term 
can be anumerical value or a linear combination of valid effects. A coefficient in 
a linear combination can be a real number or a fraction. Ifa value is specified for 
the error term, one must specify the number of degrees of freedom following the 
keyword DF. The degrees of freedom must be a positive real number. Multiple 
TEST subcommands are allowed and are executed independently. Thus, in the 
two-factor mixed model example considered above, suppose an alternate mixed 
model analysis is performed where both A and B main effects are to be tested 
against the A x B interaction. Now, the following statements are required to 
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perform the appropriate test via the TEST subcommand: 


GLM Y BY AB 
/DESIGN A, B, A*B 
/RANDOM B 
[TEST A VS A*B 
[TEST B VS A*B. 


Further, suppose that B effect is to be pooled with A x B term and A is to be 
tested against the pooled mean square. To achieve this, the following syntax is 
required: 


GLM Y BY AB 
/DESIGN A, B, A*B 
/TEST A VS B + A*B. 


In addition to ONEWAY, ANOVA, MANOVA, and GLM procedures, SPSS 
7.5 (and 8.0) incorporates a new procedure, VARCOMP, especially designed 
to estimate variance components in a random or mixed effects analysis of vari- 
ance model. It can be used through syntax commands or via dialog boxes. 
Similar to GLM, the random factors are specified by the use of a RANDOM 
subcommand. There must be at least one RANDOM subcommand with one 
random factor; several random factors can be specified on a single RANDOM 
statement or one can use multiple RANDOM subcommands which have a cu- 
mulative effect. Four methods of estimation are available in the VARCOMP 
procedure which can be specified via the use of a METHOD subcommand. 
For example, in a two-factor mixed model with A fixed and B random, the 
following syntax will estimate the variance components using the maximum 
likelihood method: 


VARCOMP Y BY AB 
/DESIGN A, B, A*B 
/RANDOM B 
/METHOD = ML. 


Remarks: (i) Four methods of estimation in the VARCOMP procedure are: anal- 
ysis of variance (ANOVA), maximum likelihood (ML), restricted maximum likeli- 
hood (REML), and minimum norm quadratic unbiased estimator (MINQUE). The 
ANOVA, ML, REML, and MINQUE methods of estimation are specified by the key- 
words SSTYPE (n) (n = 1 or 3), REML, ML, and MINQUE (n) (n = 0 or 1) respec- 
tively on the METHOD subcommand. The default method of estimation is MINQUE(1). 
MINQUE(1) assigns unit weight to both the random effects and the error term while 
MINQUE (0) assigns zero weight to the random effects and unit weight to the error 
term. The ANOVA method uses Type I and Type III sums of squares designated by the 
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keywords SSTYPE(1) and SSTYPE(3) respectively; the latter being the default option 
for this method. 

(ii) When using ML and REML methods of estimation, the user can specify the 
numerical tolerance for checking singularity, convergence criterion for checking relative 
change in the objective function, and the maximum number of iterations by use of the 
following keywords on the CRITERIA subcommand: (a) EPS(n) — epsilon value used 
in checking singularity, n > O and the default is 1. OE-8; (b) CONVERGE(n) — 
convergence value, n > O and the default is 1.0 E-8; (c) ITERATE(n) — value of the 
number of iterations to be performed, n must be a positive interger and the default 1s 50. 

(ii1) The user can control the display of optional output by the following keywords on 
the PRINT subcommand: (a) EMS — When using SSTYPE(n) on the METHOD sub- 
command, this option prints expected mean squares of all the effects; (b) HISTORY (7) 
— When using ML or REML on the METHOD subcomman4d, this option prints a table 
containing the value of the objective function and variance component estimates at every 
n-th iteration. The value of n is a positive integer and the default is 1; (c) SS — When 
using SSTYPE(n) on the METHOD subcommand, this option prints a table containing 
sums of squares, degrees of freedom, and mean squares for each source of variation. 


11.3. ANALYSIS OF VARIANCE USING BMDP’ 


BMDP Programs are successors to BMD (biomedical) computer programs de- 
veloped under the direction of W. J. Dixon at the Health Service Computing 
Facility of the Medical Center of the University of California during early 
1960s. Since 1975 the BMDP series has virtually replaced the BMD package. 
The BMDP series provides the user with more flexible descriptive language, 
newly available statistical procedures, powerful computing algorithms, and the 
capability of performing repeated analyses from the same data file. The BMDP 
programs are arranged in six categories: data description, contingency table, 
multivariate methods, regression, analysis of variance, and special programs. 
Some of the procedures covered in the BMDP series are: contingency tables, 
regression analysis, nonparametric methods, robust estimators, the analysis of 
repeated measures, and graphical output which includes histograms, bivariate 
plots, normal probability plots, residual plots, and factor loading plots. 

For a discussion of instructions for creating BMDP files and running BMDP 
programs, the reader is referred to BMDP manuals (Dixon 1992). In addition, 
a comprehensive volume, Applied Statistics: A Handbook of BMDP Analyses 
by Snell (1987), provides easy instructions to enable the student to use BMDP 
programs to analyze statistical data. The main BMDP program for performing 
analysis of variance is the BMDP 2V. It is a flexible general purpose program 
for analysis of variance, and can handle both balanced and unbalanced designs. 
It performs analysis of variance or covariance for a wide variety of fixed effects 
models. It can accommodate any number of grouping factors including repeated 


7 BMDP software is now owned and distributed by SPSS, Inc., Chicago, Illinois. 
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measures, which, however, must be crossed (not nested). The program can also 
distinguish between group factors from repeated measures factors. In addition, 
there are several other programs including 7D, 3V, and 8V that can be used to 
perform an analysis of variance. 7D performs one- and two-way fixed effects 
analysis of variance; and its output includes descriptive statistics and side-by- 
side histograms for all the groups, and other diagnostics for a thorough data 
screening including a separate tally of missing or out-of-range values. In addi- 
tion, it performs Welch and Brown-Forsythe tests for homogeneity of variances 
and an analysis of variance based on trimmed means including confidence in- 
tervals for each group. 3V performs an analysis of variance for a general mixed 
model including balanced or unbalanced designs. It uses maximum and re- 
stricted maximum likelihood approaches to estimate the fixed effects and the 
variance components of the model. The output includes descriptive statistics for 
the dependent variable and for any covariate(s). It also provides for a number 
of other optional outputs including parameter estimates for specified hypothe- 
ses and the likelihood ratio tests with degrees of freedom and p-values. The 
program 8V can perform an analysis of variance for a general mixed model hav- 
ing balanced data. It can handle crossed, nested, and partially nested designs 
involving either fixed, random, or mixed effects models. The output from 8V 
includes an analysis of variance table with columns for expected mean squares 
defined in terms of variance components, the F' values, including overall mean, 
cell means, and corresponding standard deviations, and estimates of variance 
components. 

Generally, using the simplest possible program adequate for analyzing a 
given anova design is recommended. The programs 7D and 2V are commonly 
used for the fixed effects model whereas 3V and 8V can be used for the random 
and mixed effects models. For designs involving balanced data, 8V is recom- 
mended since it is simpler to use. For designs with unbalanced data, 3V should 
be used. For random and mixed effects models, in addition to performing 
standard analysis of variance, the program 3V also provides variance compo- 
nents estimates using maximum likelihood and restricted maximum likelihood 
procedures. In addition, BMDP has a number of other programs, namely, 3D, 
1V, 4V, and 5V, which can be employed for certain special types of anova 
designs. 3D performs two-group tests (with equal or unequal group variances, 
paired or independent groups, Levene’s test for the equality of group variances, 
trimmed ¢ test, and the nonparametric forms of the tests including Spearman 
correlation and Wilcoxon’s signed-rank and rank-sum tests) and 1V performs 
one-way analysis of covariance to test the equality of adjusted group means. 4V 
performs univariate and multivariate analysis of variance and covariance for a 
wide variety of models including repeated measures, split-plot, and cross-over 
designs. The output includes, among other things, a summary of descriptive 
Statistics, multivariate statistics, and analysis of covariance. Finally, 5V ana- 
lyzes repeated measures data for a wide variety of models including those with 
unequal variances, covariances with specified patterns, and missing data. 
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11.4 USE OF STATISTICAL PACKAGES 
FOR COMPUTING POWER 


Consider an analysis of variance F test of a fixed factor effect to be performed 
against the critical value F[v, v2; 1 —a@]. Let A be the noncentrality parameter 
under the alternative hypothesis. The following SAS command can be used to 
calculate the required power: 


PR = 1 — PROBF (FC, NU1, NU2, LAMBDA), 


where PR, FC, NU1, NU2, LAMBDA are simply SAS names chosen to denote 
power, F[v;, v2; 1 — @], v1, v2, and A, respectively. For an F test involving a 
random factor, where the distribution of the test statistic under the alternative 
hypothesis depends only on the (central) F distribution, the power can be 
calculated using the following command, 


PR = 1 — PROBE (FC, KAPA, NU1, NU2), . 


where KAPA is the proportionality factor determined as a function of the design 
parameters and the values of the variance components under the alternative 
hypothesis. 

The power calculations should generally be performed for different values 
of vj, v2, A, and KAPA. This would be a useful exercise for investigating the | 
range of power values possible for different values of the design parameters 
and factor size effect to be detected under the alternative. These results could 
then be used to set the parameters of the given analysis of variance design so 
as to provide adequate power in order to detect a factor effect of a given size. 

In SPSS MANOVA, exact and approximate power values can be comptued 
via the POWER subcommand. If the POWER 1s specified by itself without 
any keyword, MANOVA calculates approximate power values of all F tests at 
0.05 significant level. The following keywords are available on the POWER 
subcommand: 


APPROXIMATE — This option calculates approximate power values which 
are generally accurate to three decimal places and are much more economical 
to calculate than the exact values. This is the default option if no keyword is 
used. 

EXACT — This option calculates exact power values using the noncentral 
incomplete beta distribution. 

F(a) — This option permits the specification of the alpha value at which the 
power is to be calculated. The value of alpha must be a number between 0 
and 1, exclusive. The default alpha value is 0.05. 


In SPSS GLM, the observed power for each F test at 0.05 significance level is 
displayed by default. An alpha value other than 0.05 can be specified by the 
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use of the keyword ALPHA(n) in the CRITERIA subcommand. The value of 
n must be between OQ and 1, exclusive. 


Remarks: (i) In SPSS Release 8.0, the observed power for each F test is no longer 
printed by default. Instead checkboxes in the Options dialog box are available for effect 
size estimates and observed power, and corresponding keywords ETASQ and OPOWER 
in the PRINT subcommand. 

(ii) SPSS also has inverse distribution functions and noncentral t, x7, and F distri- 
butions which can be used for power analysis in the same manner as in SAS. 


11.5 USE OF STATISTICAL PACKAGES FOR MULTIPLE 
COMPARISON PROCEDURES 


Most multiple comparison tests are more readily performed manually using the 
results on means, sums of squares, and mean squares provided by a computer 
analysis. The hand computations are quite simple and the use of a computing 
program does not necessarily save any time. However, some of the more recent 
methods are more suited to a computer analysis. For all three computing pack- 
ages considered here, the programs designed to perform f¢ tests do not allow 
the use of a custom-made error mean square (EMS). Some programs in some 
packages that allow the use of a ¢ test on a contrast also do not permit the 
use of a custom-made EMS. SAS provides the user the option of specifying a 
custom-made EMS whereas the error term employed in SPSS ONEWAY and 
BMDP is fixed. Most statistical packages will perform all possible pairwise 
comparisons as the default when the option to perform multiple comparisons is 
requested. More general comparisons are also available, but may require some 
restrictions in terms of which procedures may be used. 

SAS can perform pairwise comparisons using either the PROC ANOVA or 
the GLM. There are 16 different multiple comparison procedures, including 
Tukey, Scheffé, LSD, Bonferroni, Newman-Keuls, and Duncan, which can be 
readily implemented using the following statement for both the procedures: 


MEANS {name of independent variable}/{SAS name of the multiple 
comparison to be used} 


For example, in order to perform Duncan’s multiple comparison on a balanced 
one-way experimental layout using PROC ANOVA, the following commands 
may be used: 


PROC ANOVA; 
CLASS A; 
MODEL Y = A; 


MEANS A/ DUNCAN; 


Both ANOVA and GLM procedures allow the use of as many multiple compar- 
isons as the user may want simply by listing their SAS names and separating the 
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names with blanks. The name of the independent variable in the MEANS state- 
ment must be the name chosen for the independent variable when the data file 
is created. For testing more general contrasts than pairwise comparisons, the 
sum of squares for a contrast can be obtained by a CONTRAST command. For 
example, the contrast C1 with four coefficients can be tested by the statements: 


CLASS A; 

MODEL Y = A; 

CONTRAST ‘C1’ 
AAS): 


Any number of orthogonal contrasts can be tested by including more lines in 
the CONTRAST command, for example, 


CONTRAST ‘C1’ 


Al1-—1-—1; 
CONTRAST ‘C2’ 

Al—-1l1-1; 
CONTRAST ‘C3’ 

Al-—1-—-11; 


Remark: In Release 6.12, the LSMEANS statements in PROC GLM and PROC MIXED 
with ADJUST options offer several different multiple comparison procedures. In addi- 
tion, PROC MULTTEST can input a set of p-values and adjust them for multiplicity. 


SPSS can perform multiple comparison tests by using the ONEWAY and 
GLM procedures. In ONEWAY, multiple comparisons are implemented by 
the use of RANGES and POSTHOC subcommands. The RANGES subcom- 
mand allows for the following seven tests specified by the respective keywords: 
Least Significant Difference (LSD), Bonferroni (BONFERRONI), Dunn-Sidak 
(SIDAK), Tukey (TUKEY), Scheffé (SCHEFFE), Newman-Keuls (SNK) and 
Duncan (DUNCAN). (LSDMOD and MODLSD are acceptable for the Bon- 
ferroni procedure.) The RANGES subcommand provides an option for several 
tests in the same run by including more lines in the RANGES subcommand, e.g., 


/RANGES = LSD 
/RANGES = TUKEY 


etc. 


Remark: The LSD test does not maintain a single overall a-level. However, Bonferroni 
and Dunn-Sidak procedures can be performed by adjusting the overall a-level according 
to the number of desired comparisons and then using the LSD for the modified a-level. 
The default a-level for the LSD is 0.05. If some other a-level is desired, it is specified 
within parentheses immediately following the keyword LSD in the RANGE statement, 
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for example, 
/RANGES = LSD (0.0167). 


The POSTHOC subcommand offers options for twenty different multiple com- 
parisons including those that test the pairwise differences among means and 
those that identify homogenous subsets of means that are not different from each 
other. The latter tests are commonly known as multiple range tests. Among dif- 
ferent types of multiple comparisons available via POSTHOC subcommand are: 
Bonferroni, Sidak, Tukey’s honestly significant difference, Hochberg’s GT2, 
Gabriel, Dunnett, Ryan-Einot-Gabriel-Welsch F test (R-E-G-W F), Ryan- 
Einot-Gabriel-Welsch range test (R-E-G-W Q), Tamhane’s T2, Dunnett’s T3, 
Games-Howell, Dunnett’s C, Duncan’s multiple range test, Student-Newman- 
Keuls (S-N-K), Tukey’s b, Waller-Duncan, Scheffé, and least-significant differ- 
ence. One can use as many of these tests in one run as one wants, either using 
one POSTHOC subcommand, or a stack of them. 


Remark: Tukey’s honestly significant difference test, Hochberg’s GT2, Gabriel’s test, 
and Scheffé’s test are multiple comparison tests and range tests. Other available range 
tests are Tukey’s b, S-N-K (Student-Newman-Keuls), Duncan, R-E-G-W F (Ryan- 
Einot-Gabriel-Welsch F test), R-E-G-W Q (Ryan-Einot-Gabriel-Welsch range test), 
and Waller-Duncan. Among available multiple comparison tests are Bonferroni, Tukey’s 
honestly significant difference test, Siddk, Gabriel, Hochberg, Dunnett, Scheffé, and 
LSD (least significant difference). Multiple comparison tests that do not assume equal 
variances are Tamhane’s T2, Dunnett’s T3, Games-Howell, and Dunnett’s C. 


For performing more general contrasts than the pairwise comparisons be- 
tween the means, the subcommand RANGES is replaced by CONTRAST. The 
coefficients defining the contrast are specified immediately after the keyword 
CONTRAST and separated from it by an equality sign (=) in between. The 
coefficients can be separated by spaces or commas. For example, the contrast 
with four coefficients 1, 1, —1, —1 1s tested by the statement 


[CONTRAST = 1 1-1-1 


or 
/CONTRAST = 1, 1, -1, —1. 


Similar to pairwise multiple comparison procedures, a number of contrasts 
can be selected by including more lines in the CONTRAST subcommand; for 
example, 


ICONTRAST = 11-1 -1 
/ICONTRAST = 1 —1 1-1 
/CONTRAST = 1 —-1—-11 


etc. 
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Also, one can use both RANGES and CONTRAST subcommands following 
the same ONEWAY command. However, it is necessary that all the RANGES 
subcommands be placed consecutively followed by all the CONTRAST sub- 
commands. The CONTRAST subcommand previously outlined is equivalent 
to performing a tf test and is appropriate for a single a priori comparison of in- 
terest. If the researcher is interested in many simultaneous comparisons, which 
include complex contrasts as well as pairwise, a multiple comparison procedure 
based on general contrasts needs to be employed. Although Scheffé’s method 
can be accessed by the SPSS, it can be used only for pairwise comparisons. For 
more complex contrasts, however, one can use the CONTRAST statement to 
get all the quantities needed for the Scheffé’s comparison and then perform the 
computations manually. 


Remarks: (i) One can list fractional coefficients in defining a contrast and thereby 
indicating that averages are in fact being tested; however, they are not required. 

(ii) Comparisons of means that do not define a contrast (do not add up to zero) can 
be tested using the CONTRAST subcommand. SPSS will analyze a “‘noncontrast” but 
will flag a warning message that a noncontrast is being tested. 


All the tests available in ONEWAY are also available in GLM (and UNI- 
ANOVA in Release 8.0) for performing multiple comparisons between the 
means of a factor for the dependent variable which are produced via the use of 
POSTHOC subcommand. The user can specify one or more effects to be tested 
which, however, must be a fixed main effect appearing or implied in the design 
subcommand. The value of type I error can be specified using the keyword AL- 
PHA on the CRITERIA subcommand. The default alpha value is 0.05 and the 
default confidence level is 0.95. GLM also allows the use of an optional error 
term which can be defined following an effect to be tested by using the keyword 
VS after the test specification. The error term can be any single effect that is not 
the intercept or a factor tested on POSTHOC subcommand. Thus, it can be an 
interaction effect, and it can be fixed or random. Furthermore, GLM allows the 
use of multiple POSTHOC subcommands which are executed independently. 
This way the user can test different effects against different error terms. The 
output for test used for pairwise comparisons includes the difference between 
each pair of compared means, the confidence interval for the difference, and 
the significance. 

GLM also allows tests for contrasts via the CONTRAST subcommand. The 
name of the factor is specified within a parenthesis following the subcommand 
CONTRAST. After enclosing the factor name within the parenthesis, one must 
enter an equal sign followed by one of the CONTRAST keywords. In addition to 
the user defined contrasts available via the keyword SPECIAL, several different 
types of contrasts including Helmert and polynomial contrasts are also avail- 
able. Although one can specify only one factor per CONTRAST subcommand, 
multiple contrast subcommands within the same design are allowed. Values 
specified after the keyword SPECIAL are stored in a matrix in row order. For 
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example, if factor A has three levels, then CONTRAST (A) = SPECIAL (1 1 
1 1 —100 1 —1) produces the following contrast matrix: 


It 
1 — 0 
0 —| 


Suppose the factor A has three levels and Y designates the dependent variable 
being analyzed. The following example illustrates the use of a polynomial 
contrast: 


GLM Y BY A 
/CONTRAST (A) = POLYNOMIAL (1, 2, 4). 


The specified contrast indicates that the three levels of A are actually in the 
proportion 1:2:4. For illustration and examples of other contrast types, see 
SPSS Advanced Statistics 7.5 (SPSS Inc., 1997c, APPENDIX A, pp. 535- 
540). 

Similar to the SPSS, BMDP performs multiple comparisons by using the one- 
way analysis of variance program. BMDP 7D contains a number of multiple 
comparison procedures including Tukey, Scheffé, LSD, Bonferroni, Newman- 
Keuls, Duncan, and Dunnett. However, the reported level of significance for 
comparisons are for one-sided tests and should be multiplied by two to obtain 
the overall significance level associated with a family of two-sided tests. BMDP 
7D can perform all pairwise comparisons by using the following statement: 


/COMPARISON {BMDP name of multiple comparison to be used}. 


The program provides the option of several multiple comparison procedures 
that may be selected by including more procedure names consecutively; for 
example, | 


ICOMPARISON TUKEY 
SCHEFFE 
~ BONFERRONI 
NK 
DUNCAN. 


11.6 USE OF STATISTICAL PACKAGES 
FOR TESTS OF HOMOSCEDASTICITY 


All three packages (SAS, SPSS, and BMDP) provide several tests of ho- 
moscedasticity or homogeneity of variances. The MEANS statement in SAS 
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GLM procedure now includes various options for testing homogeneity of vari- 
ances in a one-way model and for computing Welch’s variance-weighted one- 
way analysis of variance test for differences between groups when the group 
variances are unequal. The user can choose between Bartlett, Levene, Brown- 
Forsythe, and O’ Brien’s tests by specifying the following options in the MEANS 
statement: 


HOVTEST = BARTLETT 

HOVTEST = LEVENE < (TYPE = ABS/SQUARE) > 
HOVTEST = BF 

HOVTEST = OBRIEN < (W = number) > 


If no test is specified in the HOVTEST option, by default Levene’s test (type = 
square) is performed. Welch’s analysis of variance is requested via option 
WELCH in the MEANS statement. For a simple one-way model with depen- 
dent variable Y and single factor classification A, the following code illustrates 
the use of HOVTEST (default) and WELCH options in the MEANS statement 
for the GLM procedure: 


PROC GLM; 

CLASS = A; 

MODEL Y = A; 

MEANS A/HOVTEST WELCH; 
RUN; 


For further information and applications of homogeneity of variance tests in 
SAS GLM, see SAS Institute (1997, pp. 356-359). 

SPSS ONEWAY provides an option to calculate Levene statistic for homo- 
geneity of group variances. In MANOVA procedure the user can request the 
option HOMOGENEITY as a keyword in its PRINT subcommand. HOMO- 
GENEITY performs tests for the homogeneity of variance of the dependent 
varaible across the cells of the design. One or more of the following specifi- 
cations can be included in the parentheses after the keyword HOMOGENEITY: 


BARTLETT — This option performs and displays Bartlett-Box F test. 

COCHRAN — This option performs and displays Cochran’s C test. 

ALL — This option performs and displays both Bartlett-Box F test and 
Cochran’s C test. This is the default option if HOMOGENEITY is requested 
without further specifications. 


SPSS GLM procedure also performs Levene’s test for equality of variances 
for the dependent variable across all cells formed by combinations of between 
subject factors. However, the Levene test in GLM (and UNIANOVA in Release 
8.0) uses the residuals from whatever model is fitted and will differ from the 


Analysis of Variance Using Statistical Computing Packages 567 


description given above if the model is not a full factorial, or if there are 
covariates. The test can be requested using the HOMOGENEITY subcommand. 
BMDP 7D performs Levene’s test of homogeneity of variances among the cell 
when performing one- and two-way analysis of variance for fixed effects. 


11.7. USE OF STATISTICAL PACKAGES 
FOR TESTS OF NORMALITY 


Tests of skewness and kurtosis can be performed using BMDP and SPSS since 
they provide standard errors of both statistics. SAS UNIVARIATE implements 
the Shapiro-Wilk W test when the sample is 2000 or less and a modified 
Kolmogorov-Smirnov (K-S) test when the sample size is greater than 2000. 
Also, SPSS EXAMINE procedure offers the K-S Lilliefors and Shapiro-Wilk 
tests, and NPAR TESTS offers a one-sample K-S test without the Lilliefors 
correction. The K-S tests in both cases are performed for sample sizes which 
are sufficiently large, while the Shapiro-Wilk test is given when sample size is 
50 or less. 


Appendices 


A STUDENT'S t DISTRIBUTION 
2 


If the random variable X has a normal distribution with mean y and variance o“, 
we denote this by X ~ N(, 07). Let X;, X2,..., X, be arandom sample from 
the N(, 07) distribution. Then it is known from the central limit theorem that 
X = )-7_, Xi/n ~ N(w, 0*/n). Applying the technique of standardization to 


X, we have 
X-—p 


mae, 


~ N(O, 1). 


If o is unknown, then it is usually replaced by S = ,/>-7_,(Xi — X)* j (n — 1) 
and the statistic Z becomes (X — 1)/(S/./n). Now, we no longer have a stan- 
dard normal variate but a new variable whose distribution is known as the 
Student’s ¢ distribution. 

Before 1908, this statistic was treated as an approximate Z variate in large 
sample experiments. William S. Gosset, a statistician at the Guinness Brewery 
in Dublin, Ireland, empirically studied the distribution of the statistic (X — 2)/ 
(S/./n) for small samples, when the sampling was from a normal distribution 
with mean p and variance a”. Since the company did not allow the publication 
of research by its scientists, he published his findings under the pseudonym 
“Student.” In 1923, Ronald A. Fisher theoretically derived the distribution of 
the statistic (X — )/(S/./n) and since then it has come to be known as the 
Student’s ¢ distribution. The distribution is completely determined by a single 
parameter v = n — 1, known as the degrees of freedom. The distribution has 
the same general form as the distribution of Z, in that they both are symmetric 
about a mean of zero. Both distributions are bell-shaped, but the ¢ distribution 
is more variable. In large samples, the ¢ distribution is well approximated by 
the standard normal distribution. It is only for small samples that the distinction 
between the two distributions becomes important. 

We use the symbol t[v] to denote a ¢ random variable with v degrees of 
freedom. A 100(1 — a@)th percentile of the ¢t[v] random variable is denoted by 
t[v, 1 — a] and is the point on the t[v] curve for which 


P{t{v] <t[v,l1—a]}=1-—-a. 


Useful tables of percentiles of the ¢ distribution are given by Hald (1952), Qwen 
(1962), Fisher and Yates (1963), and Pearson and Hartley (1970). Appendix 
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Table III gives certain selected values of percentiles of t[v]. The mean and 
variance of a f[v] variable are: 


E(t({u]) = 0 
and : 
Var(t[ uJ) = ——,  v>2. 
v—2 
B CHI-SQUARE DISTRIBUTION 


If Z;, Z2,..., Z, are independently distributed random variables and each Z; 
has the N(O, 1) distribution, then the random variable 


V= Sz? 
r=1 


is said to have a chi-square distribution with v degrees of freedom. Note that if 
X,, X2,..., X, are independently distributed and X; has the N(,4;, a7) distri- 
bution, then the random variable 


9 (X; — wi)? 
2 
fA 0; 


also has the chi-square distribution with v degrees of freedom. 

We use the symbol x’[v] to denote a chi-square random variable with v 
degrees of freedom. The chi-square random variable has the reproductive pro- 
perty; that is, the sum of two chi-square random variables is again a chi-square 
random variable. More specifically, if V; and V2 are independent random vari- 
ables, where 


Vi ~ x?ty] 
and 
Vo ~ x*[v], 
then 
Vi t+ V2 ~ x*[vy + vy]. 
The chi-square distribution was developed by Karl Pearson in 1900. A 100(1 — 
a)th percentile of the y*{v] random variable is denoted by x’[v, 1 — a] and is 


the point on the x?[v] curve for which 


P{x*[v] < x*Lv, 1 -a]} = 1 —a. 
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Useful tables of percentiles of chi-square distributions are given by Hald and 
Sinkbaek (1950), Vanderbeck and Cook (1961), Harter (1964a,b), and Pearson 
and Hartley (1970). Appendix Table IV gives certain selected values of per- 
centiles of x?[v]. The mean and variance of a x?[v] variable are: 


E(x*[v]) = v 
and 
Var(x7[v]) = 2v. 


C SAMPLING DISTRIBUTION OF (n— 1)S?2/o? 


Let X,, X2,..., X, bearandom sample from the N (2, 0”) distribution. Define 
the sample mean X and variance S? as 


n 


Xj 
xX _ i=1 
n 
and 
y pars 
(xi — XY 
52 Fe [=1 


The sampling distribution of the statistic (n — 1)S*/o7 is of particular interest 
in many Statistical problems and we consider its distribution here. 

By the addition and subtraction of the sample mean X, it is easy to see 
that 


(KG — HY = DOIG — X) + (X — wr 
i=] j 


i=] 


=) (Ki; — XP + DK — py? + 2X — w) (Ki — X) 
i=l i=l ak 

=) (X%; — XP +0(X - p). 
i=] 


Dividing each term on both sides of the equality by o* and substituting (n — 1)S? 
for )~)_,(X; — X)’, we obtain 


ly 2 _ (n= 1S? | (X-wy 
aD ca eae oe 
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Now, from Appendix B, it follows that )-7_, (X; — 4)?/o? is a chi-square 
random variable with n degrees of freedom. The second term on the right- 
hand side of the equality is the square of a standard normal variate, since 
X is a normal random variable with mean yz and variance o*/n. Therefore, 
the quantity (X — 4)*/(o*/n) has a chi-square distribution with one degree of 
_ freedom. Using advanced statistical techniques, one can also show that the two 
chi-square variables, (n — 1)S*/o7 and (X — jt)” /(o7 /n), are independent (see, 
e.g., Hogg and Craig, (1995, pp. 214—217)). Thus, from the reproductive prop- 
erty of the chi-square variable, it follows that (n — 1)S*/o7 has a chi-square 
distribution with n — 1 degrees of freedom. 


DF DISTRIBUTION 


If V; and V> are two independent random variables, where V; ~ x?[v,] and 
V. ~ x7[v2], then the random variable 


V/V 
V2/v2 


is said to have an F distribution with v, and v2 degrees of freedom, respectively. 
We use the symbol F[v,, v2] to denote an F variable with v, and v2 degrees of 
freedom. 

Suppose a random sample of size n; 1s drawn from the N({1, a7) and an 
independent random sample of size n2 is drawn from the N({2, 07) distribution. 
If S? and S3 are the corresponding sample variances, then, from Appendix C, 
it follows that 


(n; — 1)S? 
1 ~ xn, — 1] 
07 
and 
(ny — 1)S3 
- 9) : ii x7[n2 a 1] 
> 


Therefore, the quotient 


St/o? — x7lm — N/m — 1) 


~ F o—: = 
S3/oz X7[n2 — 11 /(n2 - 1) [nie lta 1] 


Fisher (1924) considered the theoretical distribution of 5 log,(S 7 / S3) known 
as the Z distribution. The distribution of the variance ratio F was derived by 
Snedecor (1934), who showed that the distribution of F was simply a trans- 
formation of Fisher’s Z distribution. Snedecor named the distribution of the 
variance ratio F in honor of Fisher. The distribution has subsequently come to 
be known as Snedecor’s F distribution. | 
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Note that the number of degrees of freedom associated with the chi-square 
random variable appearing in the numerator of F is always stated first, followed 
by the number of degrees of freedom associated with the chi-square random 
variable appearing in the denominator. Thus, the curve of the F distribution 
depends not only on the two parameters v, and v2, but also on the order in 
which they occur. 

A 100(1 — @)th percentile value of the F[v,, v2] variable is denoted by 
Fv, v2; 1 — a] and is that point on the F[v,, v2] curve for which 


P{F[vy, v2] = Flv, v2; 1 —a]} =1-a. 


Useful tables of percentiles of the F distribution are given by Hald (1952), 
Fisher and Yates (1963), and Pearson and Hartley (1970). Mardia and Zemroch 
(1978) compiled tables of F distributions that include fractional values of vy 
and v2. The fractional values of the degrees of freedom are useful when an 
F distribution is used as an approximation. Appendix Table V gives certain 
selected values of percentiles of F[v,, v2] for various sets of values of v, and 
v2. The mean and variance of an F[v,, v2] variable are: 


E(F[v1, »]) = 5 pe 
ee 


and 


2v3(v1 + v2 — 2) 


Var(F[v1, v2]) = ne) ee 


v2 > 4. 


E NONCENTRAL CHI-SQUARE DISTRIBUTION 


If X,, X2,..., X, are independently distributed random variables and each X; 
has the N(,z;, 1) distribution, then the random variable 


V= Sx? 
i=l 


is said to have a noncentral chi-square distribution with v degrees of freedom. 
The quantity A = ()07_, yu2)2 is known as the noncentrality parameter of the 
distribution. We use the symbol x2[v, A] to denote a noncentral chi-square 
random variable with v degrees of freedom and the noncentrality parameter A. 

Note that the ordinary or central chi-square distribution is the special case 
of the noncentral distribution when the nonncentrality parameter 4 = 0. In 
the statistical literature the noncentrality parameter is sometimes defined dif- 
ferently. Some authors use 4 = )°;_, 47 whereas others use A = 4 )-)_, 1}, 
both using the same symbol 4. The noncentral chi-square variable, like the 
central chi-square, possesses the reproductive property; that is, the sum of 
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two noncentral chi-square variables is again a noncentral chi-square variable. 
More specifically, if V, and V, are independent random variables, such 
that 


Vi ~ x7 [A] 
and 

Vo ~ x7 [vy, Ad], 
then 


Vit+V2.~ x7 [1 + v2, A, + Az]. 


The noncentral chi-square distribution can be approximated in terms of a 
central chi-square distribution. A detailed summary of various approximations 
including a great deal of other information can be found in Tiku (1985a) and 
Johnson et al. (1995, Chapter 29). Tables of the noncentral chi-square distribu- 
tion have been prepared by Hayman et al. (1973). The mean and variance of a 
x2 [v, A] variable are: 


E(x? [v, A]) =v +a 
and 


Var(x2 [v, A]) = 2(v + 2A). 


F NONCENTRAL AND DOUBLY NONCENTRAL 
t DISTRIBUTIONS 


If U and V are two independent random variables, where U ~ N[6, 1] and 
V ~ x’[v], then the random variable 


U/VV/v 


is said to have a noncentral ¢ distribution with v degrees of freedom and the 
noncentrality parameter 5. We use the symbol t’[v, 5] to denote a noncentral 
t distribution with v degrees of freedom and the noncentrality parameter 4. 
The distribution is useful in evaluating the power of the ¢ test. There are many 
approximations of the noncentral ¢ distribution in terms of normal and (cen- 
tral) ¢ distributions. A detailed summary of various approximations and other 
results can be found in Johnson et al. (1995, Chapter 31). A great deal of 
other information about the noncentral ¢ distribution 1s given in Owen (1968, 
1985). Tables of the noncentral ¢ distribution have been prepared by Resnikoff 
and Lieberman (1957) and Bagui (1993). The mean and variance of a t’[v, 6] 
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variable are: 


E(t'[v, 6]) = 


and 


6)""()s] 


/ v 2 
Var(t'[v, 6]) = oer + 6°) — 


If V is a noncentral chi-square distribution with the noncentrality parameter 
A, then 


U/J/V/v 


has a doubly noncentral ¢ distribution with noncentrality parameters 6 and A, 
respectively. Tables of the doubly noncentral ¢ distribution are given by Bulgren 
(1974). Further information including analytic expressions for the distribution 
function and some computational aspects can be found in Johnson et al. (1995, 
pp. 533-537) and references cited therein. 


G NONCENTRAL F DISTRIBUTION 


If V; and V2 are two independent random variables, where V; ~ x7 [vy , A] and 
V2 ~ x?[v2], then the random variable 


Vi /vy 
V2/v2 


is said to have a noncentral F distribution with v, and v) degrees of freedom 
and the noncentrality parameter 2. We use the symbol F'’[v;, v2; A] to denote a 
noncentral F variable with v; and v2 degrees of freedom and the noncentrality 
parameter 1. 

It is sometimes useful to approximate a noncentral F' distribution in terms 
of a (central) F distribution. A detailed summary of various approximations 
including a great deal of other information can be found in Tiku (1985b) and 
Johnson et al. (1995, Chapter 30). Comprehensive tables of the noncentral 
F distribution were prepared by Tiku (1967, 1972). Tables and charts of the 
noncentral F distribution are discussed in Section 2.17. The mean and variance 
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of an F’[v), v2; A] variable are: 


V2(v; + A) 


E(F'[vy, v23A]) = =D) 


v2 >2 


and 


Var(F'[v1, v2;A]) = v2(v2 — 2)2(v, — 4) 


, >4. 
H DOUBLY NONCENTRAL F DISTRIBUTION 


If V; and V2 are two independent random variables, where V; ~ x7 [vy , A] and 
V> ~ x2 [vo, Az], then the random variable 


Vi /Vy 
V2/v2 


is said to have adoubly noncentral F distribution. We use the symbol F’”’[v}, v2; 
i1,A2] to denote a doubly noncentral F variable with v; and v2 degrees of 
freedom and noncentrality parameters A, and A2. Thus, the doubly noncentral 
F distribution is the ratio of two independent variables, each distributed as a 
noncentral chi-square, divided by their respective degrees of freedom. Tables 
of the doubly noncentral F distribution were given by Tiku (1974). Further 
discussions and details about the distribution including applications in contexts 
other than analysis of variance can be found in Tiku (1974) and Johnson et al. 
(1995, pp. 499-502). 

The doubly noncentral F distribution is related to the doubly noncentral beta 
in the same way as the (central) beta to the (central) F. It is the distribution of 
Vi /(V + V2). 


| STUDENTIZED RANGE DISTRIBUTION 


Suppose that (X;, X2,..., X,) 1s arandom sample from a normal distribution 
with mean yp and variance o*. Suppose further that S* is an unbiased estimate 
of o* based upon v degrees of freedom. Then the ratio 


q[p, v] = {max(X;) — min(X;)}/S 


is called the Studentized range, where the arguments in the square bracket 
indicate that the distribution of g depends on p and v. In general, the Studentized 
range arises as the ratio of the range of a sample of size p from a standard normal 
population to the square root of an independent x *[v]/v variable with v degrees 
of freedom. In analysis of variance applications, normal samples are usually 
means of independent samples of the same size, and the denominator is an 
independent estimate of their common standard error. The sampling distribution 
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of g[p, v] has been tabulated by various workers and good tables are available 
in Harter (1960), Owen (1962), Pearson and Hartley (1970), and Miller (1981). 
Perhaps the most comprehensive table of percentiles of the Studentized range 
distribution is Table B2 given in Harter (1969a), which has p = 2 (1) 20 (2) 40 
(10) 160; v = 1 (1) 20, 24, 30, 40, 60, 120, oo; and upper-tail a = 0.001, 0.005, 
0.01, 0.025, 0.05, 0.1 (0.1) 0.9, 0.95, 0.975, 0.99, 0.995, 0.999. Some selected 
percentiles of g[p, v] are given in Appendix Table X. The table is fairly easy 
to use. Let g[p, v; 1 — a] denote the 100(1 — a@)th percentile of the g[p, v] 
variable. Suppose p = 10 and v = 20. The 90th percentile of the studentized 
range distribution is then given by 


g{10; 20; 0.90] = 4.51. 


Thus, with 10 normal observations from a normal population, the probability 
is 0.90 that their range is not more than 4.51 times as great as an independent 
sample standard deviation based on 20 degrees of freedom. 


J) STUDENTIZED MAXIMUM MODULUS DISTRIBUTION 


The Studentized maximum modulus is the maximum absolute value of a set 
of independent unit normal variates which is then Studentized by the standard 
deviation. Thus, let X;, X2,..., Xp, be a random sample from the N(u, ao”) 
distribution. Then the Studentized maximum modulus statistic is defined by 


max |X; — X| 
m[p, v] = ae 


where X = )-?_, X;/pand S? = )~?_,(X;-X)’/(p—1). For the case where S* 
represents an independent estimate of o? such that vS?/o7 has a chi-square dis- 
tribution, the distribution was first derived and tabulated by Nair (1948). The 
critical points for the studentized maximum modulus distribution can also be 
obtained by taking the square roots of the entries in the tables of the Studentized 
largest chi-square distribution with one degree of freedom for the numerator as 
given by Armitage and Krishnaiah (1964). Pillai and Ramachandran (1954) gave 
a table for a = 0.05; p = 1(1)8; and v = 5(5)20, 24, 30, 40, 60, 120, oo. Dunn 
and Massey (1965) provided a table for a = 0.01, 0.025, 0.05, 0.10(0.1)0.50; 
p =2, 6, 10, 20; and v = 4, 10, 30, oo. Hahn and Hendrickson (1971) give gen- 
eral tables of percentiles of m[p, v] fora = 0.01, 0.05, 0.10; p =(1)(1)6(2)12, 
15, 20; and v = 3(1)12, 15, 20, 25, 30, 40, 60, which also appear in Miller 
(1981, pp. 277-278). Stoline and Ury (1979) and Ury et al. (1980) give spe- 
cial tables with p=a(a — 1)/2 for a=2(1)20 and v = 2(2)50(5)80(10)100, 
respectively. Bechhoefer and Dunnett (1982) have provided tables of the dis- 
tribution of m[p, v] for p =2(1) 32; v =2(1)12(2)20, 24, 30, 40, 60, oo; and 
a = 0.10, 0.05, and 0.01. These tables are abridged versions of more extensive 
tables given by Bechhoefer and Dunnett (1981). Hochberg and Tamhane (1987) 
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also give tables with p = a(a — 1)/2 for a= 2(1)16(2) 20 and v = 2(1) 30(5) 
50(10) 60(20) 120, 200, co. Some selected percentiles of m[p, v] are given in 
Appendix Table XV. 


K SATTERTHWAITE PROCEDURE AND ITS APPLICATION 
TO ANALYSIS OF VARIANCE 


Many analysis of variance applications involve a linear combination of mean 
squares. Let S? (i = 1,2..., p) be p mean squares such that v; S / of has achi- 
square distribution with v; degrees of freedom. Consider the linear combination 
Se 4 4:8: where £;’s are known constants. Satterthwaite (1946) proce- 
dure states that vS?/o? is distributed approximately as a chi-square distribution 
with v degrees of freedom where o* = E(S’) and v is determined by 


Satterthwaite procedure is frequently employed for constructing confidence 
intervals for the mean and the variance components in a random and mixed 
effects analysis of variance. For example, if a variance component 07 is esti- 
mated by S?= )77_, £; S?, then an approximate 100(1 — w)% confidence inter- 
val for o? is given by 


vS? 3 vS? 
——— Ss SO ee 
x° Ly, a/2] x*L, l —a/2] 


where x*[v, a@/2] and x2[v, 1 — a/2] are the 100(@/2)th lower and upper per- 
centiles of the chi-square distribution with v degrees of freedom and v is deter- 
mined by the formula given previously. 

Another application of the Satterthwaite procedure involves the construction 
of a psuedo-F test when an exact F test cannot be found from the ratio of 
two mean squares. In such cases, one can form linear combinations of mean 
squares for the numerator, for the denominator, or for both the numerator and the 
denominator such that their expected values are equal under the null hypothesis. 
For example, let 


MS’ = £,S7 +--+ + £82 
and 


MS” = 0,52 +--+ 28%, 
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where the mean squares are chosen such that E(MS’) = E(MS”") under the null 
hypothesis that a particular variance component is zero. Now, an approximate 
F test of the null hypothesis can be obtained by the statistic 


_ MS' 
a MS”’ 


which has an approximate F distribution with v’ and v” degrees of freedom 
determined by 


, (6S2 +o + b582)" 
= £2.54 /v, + +++ + £254 /v, 


and 


1 (€.S; aeaeas bss?) 
SS Sea 
£2. $4 /y, +... + £284 /v, 


In many situations, it may not be necessary to approximate both the numerator 
and the denominator mean squares for an approximate F' test. However, when 
both the numerator and the denominator mean squares are constructed, it is al- 
ways possible to find additive combinations of mean squares, and thereby avoid 
subtracting mean squares which may result in a poor approximation. For some 
further discussions of psuedo-F tests, see Anderson (1960) and Eisen (1966). 

In many applications of the Satterthwaite procedure, some of the mean 
squares may involve negative coefficients. Satterthwaite remarked that care 
should be exercised in applying the approximation when some of the coeffi- 
cients may be negative. When negative coefficients are involved, one can rewrite 
the linear combination as S$? = S4—5S3 , where S4 contains all the mean squares 
with positive coefficients and S% with negative coefficients. Now, the degrees 
of freedom associated with the approximate chi-square distribution of S* are 
determined by 


f = (S84 —S3)°/(S4/ fat Sb/ fe), 


where f,4 and fg are the degrees of freedom associated with the approximate 
chi-square distributions of 57, and S?, respectively. Gaylor and Hopper (1969) 
showed that Satterthwaite approximation for S* with f degrees of freedom is 
an adequate one when 


Si / Sz > Fife, fa,0.975] x Fifa, fp;0.5] 


if f4 < 100 and fz > f4/2. The approximation is usually adequate for the dif- 
ferences of mean squares when the mean squares being subtracted are relatively 


580 The Analysis of Variance 


small. Khuri (1995) gives a necessary and sufficient condition for the Satterth- 
waite approximation to be exact in balanced mixed models. 


L COMPONENTS OF VARIANCE 


In discussing Models II and III, we have introduced variances corresponding 
to the random effects terms in the analysis of variance model. These have been 
designated “components of variance” since they represent the parts of the total 
variation that can be ascribed to these sources. The variance components are 
associated with random effects and appear in both random and mixed models. 

Variance components were first employed by Fisher (1918) in connection 
with genetic research on Mendelian laws of inheritance. They have been widely 
used in evaluating the precision of instruments and, in general, are useful in 
determining the variables that contribute most to the variability of the process 
or the different sources contributing to the variation in an observation. This 
permits corrective actions that can be taken to reduce the effects of these vari- 
ables. Another use has been described by Cameron (1951), who used variance 
components in evaluating the precision of estimating the clean content of wool. 
Kussmaul and Anderson (1967) describe the application of variance compo- 
nents for analyzing composite samples, which are obtained by pooling data 
from individual samples. 

There is a large body of literature related to variance components, which 
cover results on hypothesis testing, point estimation, and confidence intervals, 
and fairly complete bibliographies are provided by Sahai (1979), Sahai et al. 
(1985), and Singhal et al. (1988). Additional works of interest include survey 
papers by Crump (1951), Searle (1971a, 1995), Khuri and Sahai (1985), and 
Burdick and Graybill (1988), including texts and monographs by Rao and Kleffe 
(1988), Burdick and Graybill (1992), Searle et al. (1992), and Rao (1997). 


M_ INTRACLASS CORRELATION 


In the random effects model 
Vg = Op eis: TH Np 2c Oy J = Ay eset, 


jis considered to be a fixed constant and a@;’s and the e;;’s are independently dis- 
tributed random variables with mean zero and variances o2 and a, respectively. 
Thus, as a part of the model, 


E(yij) = E(u) + E(@;) + Ei;) 
=u+0+0 
= 4 
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and 
Var(yi;) — Var({L) + Var(q; ) + Var(eé;;) 
=0+02+0; 


ee 2 
=O, +90,. 


The covariance structure of the model may be represented as follows: 


0, ifi 4 i’ 
Cov(yij, WI= Oe tog, iis, jas’ 
a2, ifi=i’, jx j’. 


The intraclass correlation is then defined by 


Cov(yij, vij’) o2 


p SS SS Se 
J Var(yij),/Var(yij) oF +03 


Thus, p is the correlation between the pair of individuals belonging to the same 
class and has the range of values from —1/(n — 1)to 1. The intraclass correlation 
was first introduced by Fisher (1918) as a measure of the correlation between 
the members of the same family, group, or class. It can be interpreted as the 
proportion of the total variability due to the differences in all possible treatment 
groups of this type. The intraclass correlation coefficient 1s a parameter that 
has been studied classically in statistics (see, e.g., Kendall and Stuart (1961, 
pp. 302-—304)). It has found extensive applications in several different fields 
of study including use as a measure of the degree of familial resemblance 
with respect to biological and environmental characteristics. It also plays an 
important role in reliability theory involving observations on a sample of various 
judges or raters, and in sensitivity analysis where it has been used to measure 
the efficacy of an experimental treatment. For a review of inference procedures 
for the intraclass correlation coefficient in the one-way random effects model, 
see Donner (1986). 


N ANALYSIS OF COVARIANCE 


The analysis of covariance 1s a combination of analysis of variance and regres- 
sion. In analysis of variance, all the factors being studied are treated qualitatively 
and in analysis of regression all the factors are treated quantitatively. In ana- 
lysis of covariance, some factors are treated qualitatively and some are treated 
quantitatively. The term independent variable often refers to a factor treated 
quantitatively in analysis of covariance and regression. The term covariate or 
concomitant variable is also used to denote an independent variable 1n an ana- 
lysis of covariance. The analysis of covariance involves adjusting the observed 
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value of the response or dependent variable for the linear effect of the con- 
comitant variable. If such an adjustment for the effects of the concomitant 
variable is not made, the estimate of the error mean square would be inflated 
which would make the analysis of variance test less sensitive. The adjustment 
or elimination of the linear effect of the concomitant variable generally results 
in a small mean square. The analysis of covariance uses regression analysis 
techniques for elimination of the linear effect of the concomitant variable. The 
technique was originally introduced by Fisher (1932) and Cochran (1957) who 
presented a detailed account of the subject. Analysis of covariance techniques, 
however, are generally complicated and are often considered to be one of the 
most misunderstood and misused statistical techniques commonly employed 
by researchers. A readable account of the subject is given in Snedecor and 
Cochran (1989, Chapter 18) and Winer et al. (1991, Chapter 10). For a more 
mathematical treatment of the topic, see Scheffé (1959, Chapter 6). An ex- 
tended expository review of the analysis of covariance is contained in a set of 
seven papers which appeared in a special issue of Biometrics (Vol. 13, No. 3, 
1957). A subsequent issue of the same journal (Vol. 38, No. 3, 1982) includes 
discussion of complex designs and nonlinear models. Further discussions on 
analysis of covariance can be found in a series of papers appearing in a special 
issue Of Communications in Statistics: Part A, Theory and Methods (Vol. 8, 
No. 8, 1979). For a book-length account of the subject, see Huitema (1980). 


O EQUIVALENCE OF THE ANOVA F AND 
TWO-SAMPLE t TESTS 


In this appendix, it is shown that in a one-way classification, the analysis of 
variance F test is equivalent to the two-sample ¢ test. Consider a one-way 
classification with a groups and let the i-th group contain n; observations with 
N = )>j_, nj. Let y;; be the j-th observation from the i-th group (i = 1, 2,..., 
a;j =1,2,...,n;). The F statistic for testing the hypothesis of the equality 
of treatment means 1s defined by 


_ MSs _ SSg/dfz 
~ MSw  SSw/dfw’ 


where 


SSz3 = So nil — 7), 
i=l 
SSw =) > 04-51). 
i=1 j=l 
dfg=a-—1 and dfw=WN-a. 
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For the case of two groups (1.e., a = 2), it follows that 


SSp =ni(y1. — ¥.)° + n2(F2. — 9), 
SSw = (nm, — 1)S7 + (m2 — 1)S5, 
dfx = 1, and dfw =n, +n2 —-2, 


where 
ny 
Yon — jy) 
fa 
; ny — ] 
and 
n2 
S02; — jr) 
i=1 
ae 
fi nz — 1 
Furthermore, noting that 
My. + N22. 
ny +n? 
we have 
Gics ¥ = n3(y1. = ¥2)? 
= (nj +12)? 
and 
Pe nia, — Y2.)? 


(ny +n2) 


Now, making the substitution, SSz can be written as 


SS 
af (ny +72) 


nino(1. — ¥2.) 
ny +n2 


_ nyn3(v1, — y2.)* + nine. — y2.)" 


583 


584 The Analysis of Variance 


Finally, the statistic F can be written simply as 


ee 1 1 

(V1. — ye.) oaaaae Uieams 
ae my "2 
~ {(ny — 1)S? + (nz — 183} /(. + 2 — 2) 
— a. - jn)? 


ay cn, 
PA\n, np 


where 


gz — Mu = DST + M2 — Sz 
P njytno—2 


Since a two-sample tf statistic with n, + nz — 2 degrees of freedom is defined 
by 


— lv. — Ya. | 


it follows that F = t?. 


P EQUIVALENCE OF THE ANOVA FAND PAIRED t TESTS 


In this appendix, it is shown that in a randomized block design with two treat- 
ments, the analysis of variance F test is equivalent to the paired ¢ test. Consider a 
randomized block design with n blocks and ¢ treatments. Let y;; be the observa- 
tion from the 7-th treatment and the j-th block (i = 1, 2,...,f; 7=1,2,..., 7). 
The F statistic for testing the hypothesis of the equality of treatment means is 
defined by 


_ MS, _ SS,/df; 
~ MSe — SSg/dfr’ 


where 
f 
SS, =n )°(i.- 9.) 
i=l 


t n 
SSe = 0) 04 — H.-H +9), 


i=1 j=l 


df.=t—1, and dfg=(t—1)\(n—1). 
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For the case of two treatments (1.e., tf = 2), it follows that 
SS, = nl(1. — 5.) + 2. — 9.71, 
SSz = > [ong — Wn. — PF +H.) +O — 2, -— HF + 5.71, 
j=l 


df, = 1, and dfg =n-—1. 


Furthermore, noting that 


ee) 2) 
J 7 
and 
ee cage: 
y., i. ese 9) 5) 
we have 
: _ (¥1. — Yo.) 
(W.-Y = gee 
7 _ (1. — 92)? 
(J2-pyY = oer eae 
7 7 7 (v1; — yas) — On. — 921? 
(mj —W.-— V+.) = ee 
and 


(0 — 5a. 5 + 57% = w= Gr = aT" 
J J : 


7 4 

Now, making the substitution, SS; and SS¢ can be written as 

n 
SS; = =(h1. — 52)’ 

2 

and 
le 2 ol 

SSr = 5 dou — yoj) — Gr. — FadV. 
Again, letting dj = y1j — yoj,d = )0-_, dj/n=51. — 52, we obtain 


n- 
SS, = —(dy’ 
5 (A) 
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and 
le a5 
SSe = 5 Gi a 
jJ=1 
Finally, the statistic F can be written simply as 
=(d)"/1 
aera idea 
le : 
5 D(a — dn - 1) 
j=l 
_ dy 
_ s3 / n 
where 
Yoda; — dy 
2a | 
4 n—1 


Since a paired ¢ statistic with n — 1 degrees of freedom is defined by 


i d 
Sal/n- 


it follows that F = t?. 


Q_ EXPECTED VALUE AND VARIANCE 


If X is a discrete random variable with probability function p(x), the expected 
value of X, denoted by E(X), is defined as 


CO 


E(X) = )_ x; p(x), 


i=] 
provided that )°7~, x; p(x;) < oo. If the series diverges, the expected value 


is undefined. If X is a continuous random variable with probability density 
function f(x), then 


E(X)= [ xf (x) dx, 
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provided that ee |x| f(x) dx < oo. If the integral diverges, the expected value 
is undefined. E(X) is also referred to as the mathematical expectation or the 
mean value of X and is often denoted by j. 

The expected value of a random variable is its average value and can be 
considered as the center of the distribution. For any constants a, b;, b2,..., bx, 
and the random variables X,, X2,..., Xx, the following properties hold: 


E(a)= a, 
E(b; X;) = 6, E(X;), 


and 
k k 
eas nx =a+ b; E(X;). 
i=l i=1 


If X is a random variable with expected value E(X), the variance of X, 
denoted by Var(X), is defined as 


Var(X) = E[X — E(X)I’, 


provided the expectation exists. The square root of the variance is known as 
the standard deviation. The variance is often denoted by o? and the standard 
deviation by o. 

The variance of a random value is the average or expected value of squared 
deviations from the mean and measures the variation around the mean value. 
For any constants a and b, and the random variable X, the following properties 
hold: 


Var(a) = 0, 
Var(bX) = b” Var(X), 
and | 


Var(a + bX) = b? Var(X). 


R COVARIANCE AND CORRELATION 


If X and Y are jointly distributed random variables with expected values F(X) 
and E(Y), the covariance of X and Y, denoted by Cov(X, Y), is defined as 


Cov(X, Y) = E{(X — E(X))Y — E(Y))). 


The covariance is the average value of the products of the deviations of the 
values of X from its mean and the deviations of the values of Y from its mean. 
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It can be readily shown that 
Cov(X, Y) = E(XY)— E(X)E(Y). 


The variance of a random variable is a measure of its variability and the 
covariance of two random variables can be considered as a measure of their 
joint variability or the degree of association. For any constants a, b, c,d, and 
the random variables X, Y, U, V, the following properties hold: 


Cov(a, x)= 0, 
Cov(ax, bY) = abCov(x, Y), 


and 


Cov(iaX + bY, cU + dV) = ac Cov(x, U)+ ad Cov(X, V) 
+ bc Cov(Y, U) + bd Cov (Y, V). 


In general, for any constants a;’s, b;’s, and the random variables X;’s and Y;’s 
@=1,2,...,k; j = 1,2,..., 2), the following relationships hold: 


£ k 
com Yak 1) = YS) > ab; Cov (X;, Y;), 
jal : : 


and 


k k 
ver() ax: = a? Var(X;) + 2 ) ) ajajr Cov(X;, X;). 
i=l i poo! 


i=l I l 
i< i’ 


If X and Y are jointly distributed random variables, the correlation of X and 
Y denoted by p= 1s defined as 


_ Cov( xX, Y) 
= Var(XVantY) 
The correlation can be considered as the standardized covariance. Correlation 


equals covariance if both variables measure the standardized scores with unit 
variances. It can be shown that —1 <p <1. 


S RULES FOR DETERMINING THE ANALYSIS 
OF VARIANCE MODEL 


In this appendix, we outline rules for determining the analysis of variance 
model in a balanced experimental layout. The rules are applicable to crossed 
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classifications containing an equal number of observations for each combination 
of factor levels. They are also applicable to completely nested classifications 
as well as crossed-nested classifications containing an equal number of levels 
for each nested factor. We illustrate the rules with a three-factor crossed-nested 
classification where factors A and C are crossed and factor B is nested within 
factor A and crossed with factor C. We assume that the factor A has a levels, 
the factor B has b levels, the factor C has c levels, and there are n replications. 


Rule 1. Each model contains a general constant or overall mean to be denoted 
by pL. 


Rule 2... Each model contains a main effect for each factor which is denoted 
by the corresponding Greek letter with a suffix indicating the level of the fac- 
tor. If a factor is nested within another factor, the nesting is indicated using 
the parenthesis notation for its suffix. For the example being considered, the 
main effects for factors A, B, and C are: a;, B;(i), and y%, i = 1,..., a; 
Gs. ba8(0 Stl cewek. 


Rule 3. Each model contains interaction terms corresponding to all crossed 
factors. There are no interaction terms for those factors containing both a nested 
factor and the factor within which it is nested. For the example being consider- 
ed, there are A x C and B x C interactions. However, there are no A x B 
and A x B x C interactions since factor B is nested within factor A. The 
interaction terms in the model are denoted by the combination of the Greek 
letters enclosed within a pair of parentheses followed by subscripts indicating 
the levels of the factors being crossed. For the example being considered, the 
model terms for A x C and B x C interactions are: (wy), and (ay)jz, i = 
Lteste@e 7 Sly oh aD = Ihde 


Rule 4. Interactions between a nested factor and another factor with which 
the nested factor is crossed are always themselves nested. In the example being 
considered, factor B is nested within factor A and is crossed with factor C; 
thus, the B x C interaction 1s considered as nested within factor A. If an inter- 
action term is nested within another, the nesting is indicated by the parenthesis 
notation for its suffix. For the example being considered, the fact that (By) jx 1s 
nested within the levels of factor A is indicated by the parenthesis notation as 


(BY) jKG), ge — | EP ¢ 2 Pile seeded: | — ak ae! op 


Rule5. The final term in the model is the error term which is considered nested 
within all the factors. For the example being considered, the model term for the 
error is denoted by egijjx,, i = 1,...,€; j= 1,...,b5 k= 1,...,¢5 = 
| eeereee ie 


Rule 6. The final model is written as an algebraic equation between the 
response variable denoted usually by the Roman letter x or y (with a suffix 
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comprising all the subscripts) in the left-hand side and the sum of all the model 
terms appearing in the right-hand side. For the example being considered, the 
model is: 


Yijee = M+ aj + By + Ve + (OV dik + (BY) jy + Ceajk 


fo) > 
I 


T RULES FOR CALCULATING SUMS OF SQUARES 
AND DEGREES OF FREEDOM 


In this appendix, we outline rules for calculating sums of squares and degrees 
of freedom in an analysis of variance model. The rules are applicable to crossed 
classifications containing an equal number of observations for each combination 
of factor levels. They are also applicable to completely nested classifications as 
well as crossed-nested classifications containing an equal number of levels for 
each nested factor. We illustrate the rules with the three-factor crossed-nested 
classification considered in Appendix S. 


Rule 1. Write the model equation following the rules outlined in Appendix S. 
For the example being considered, the model equation is 


i 
Yijke = +0; + Byiy + Ve + (AY Dik + (BY) jy + Ceajn 4 
£ 


Rule 2. For each model term (except the general constant) write a symbolic 
product consisting of the subscripts of the term, using the subscript alone if it 
is in parentheses and subscript minus | if it is not in parentheses. Expand the 
symbolic product algebraically. For example, the symbolic product for a; 1s 
i — 1, for (By) jay itis i(j — 1)(k — 1) =1jk —ij —ik +i, and so on. 


Rule 3. The typical expression to be squared and summed for obtaining the 
sum of squares associated with a model term consists of algebraic means in- 
dexed by subscripts of the symbolic product determined by Rule 2 and dots for 
the subscripts missing in the symbolic product. The number 1 is replaced by 
the suffix containing all the dots and designates the grand mean. In the exam- 
ple being considered, the symbolic product for a; is i — 1 and the algebraic 
expression to be squared and summed is y;... — y..... The symbolic product for 
(BY) jeay 8 iG — 1k — 1) = ijk — ij — ik +1, and the algebraic expression 
to be squared and summed is Yjjx. — Yij.. — Viz. + Yi.... and So on. 
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Rule 4. The sum of squares for a model term is obtained by squaring the 
algebraic expression formed by Rule 3, summing it over the subscripts in the 
model term, and then multiplying it by the product of the number of levels 
corresponding to the subscripts not appearing in the suffix of the model term. 
For the example being considered, the sum of squares for a; is obtained by 
squaring (¥;.. — y....), Summing over i, and then multiplying it by bcn; that is, 
ben ya Cin. — y._)*. Similarly, the sum of squares for (By) jk) 1S Obtained 
by squaring (Vj jx. — Vij. — Vix. + Yj...), Summing over (i, j,k), and then mul- 
tiplying it by n; that is, n 77) Day Depa ik. — Jaz. — Jia. + 5i...)”, and 
sO on. 


Rule 5. The sum of squares for the general constant is obtained by squaring 
the grand mean (yj...) and then multiplying it by the total number of observations 
(abcn), that is, abcn(y....)* . This sum of squares is usually not included in the 
analysis of variance table. 


Rule 6. The total sum of squares is obtained by squaring the deviations of the 
observations from the grand mean, and then summing it over all the subscripts, 


that is, i=i yA eat Deni Vike = yy : 


Rule 7. The degrees of freedom corresponding to a sum of squares are cal- 
culated by replacing the subscripts in the symbolic product formed by Rule 2 
by the number of levels for that subscript. For the example being considered, 
the degrees of freedom corresponding to the sum of squares for a; 1s obtained 
by replacing i by a in the symbolic product i — 1; that is, a — 1. Similarly, for 
(BY) jx) the symbolic product is i(j — 1)(k — 1) and the corresponding degrees 
of freedom are a(b — 1)(c — 1). 


Rule 8. The number of degrees of freedom for the general constant is one, 
and the total number of degrees of freedom is defined as one less than the total 
number of observations. 


U RULES FOR FINDING EXPECTED MEAN SQUARES 


Determination of expected mean squares in an analysis of variance model is 
essential in order that appropriate mean squares may be used to construct an 
F statistic for a particular hypothesis of interest. They are also important for 
finding estimators of the variance components. Although they are not difficult to 
obtain, it is evident from our previous treatment that the derivation of expected 
mean squares for various models can be tedious, involving an inordinate amount 
of time and effort. In this appendix, we outline rules for finding expected mean 
Squares in an analysis of variance model. The rules are applicable to crossed 
classifications containing an equal number of observations for each combination 
of factor levels. They are also applicable to completely nested classifications 


592 The Analysis of Variance 


as well as crossed-nested classifications containing an equal number of levels 
for each nested factor. We illustrate the rules with a three-factor crossed-nested 
classification considered earlier in Appendices S and T. It should be noted 
here that in the determination of rules for the analysis of variance model and 
the calculation of sums of squares and degrees of freedom, it does not matter 
whether factor effects are fixed or random. However, this is not so in finding 
expected mean squares and we now assume that factors A and C are fixed 
whereas factor B is random. 


Rule 1. Write the mathematical model following the rules given in Appendix 
S, including the assumptions for fixed and random effects. For the example 
being considered, the mathematical model is 


cs 
Yijke = +; + By) + Ve + (@Y dik + (BY) jain + Ceci i. 1 


where the a@;’s, y's, and (ay);,’s are assumed to be constants subject to the 
restrictions: 


a Cc 
Y > a; = yo v = 0, 
k=l 


i=1 


Yay ik = )_@Y dik = 0. 
i=l k=l 


We further assume that the Bj”jy’s, (BY) jx@i)’s, and egjx)’s are normally dis- 
tributed with zero means and variances Op, Ta)? and a7, respectively; and the 
three groups of random variables are pairwise independent. The random effects 
(BY) jx@)’S, however, are correlated due to the following restrictions: 


Y (By) «iy = 0 for all j(é). 
k=1 


Rule 2. Construct a two-way row x column table where there is a row for 
each component term in the model including the error term (except the general 
constant) and there is a column for each of the subscripts that appear in the 
model. The particular order of rows and columns is immaterial, but it helps 
to maintain some systematic scheme in order to avoid any mistakes. For the 
example being considered, the two-way table is constructed as follows: 
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ed 

Bj(i) 

Yk 
(ry ix 
(BY )jK (i) 
© e(ijk) 


Rule 3. In each row where one or more subscripts are in parentheses, write | 
in the columns corresponding to the subscripts in parentheses. For the example 
being considered, the two-way table now appears as follows: 


i j k £ 
Qj 
By) I 
Yk 
(ay) jk 
(BY) jk (i) I 


ée rA( ijk) 1 1 1 


Rule 4. In each row where one or more subscripts are not in parentheses: 


(i) write 1 in the columns corresponding to subscripts not in parentheses if 
the subscript represents a random factor; 

(ii) write 0 in the columns corresponding to subscripts not in parentheses if 
the subscript represents a fixed factor. 


For the example being considered, the two-way table now appears as follows: 


i j k l 
Qa; 0 
Bj(i) I I 
Vk 0 
(ary) ix 0 0 
(B-Y) jk (i) | | 0 
eo iik) 1 1 1 1 


Rule 5. Write the number of levels corresponding to the column subscripts in 
the remaining cells that are still vacant. For the example being considered, the 
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two-way table now appears as follows: 


i j k L 
Qj 0 b Cc n 
B (i) ] ] C n 
Vk a b 0 n 
(ay) jk 0 b 0 n 
(BY) jk (i) l | 0 n 
€ (ijk) 1 ] ] n 


Rule 6. Each fixed effect has the effects parameter defined by the sum of 
squared effects divided by its degrees of freedom. Each random effect has 
the effects parameter defined by the corresponding variance component. For 
every model term representing a fixed effect, let A = ® designate the effects 
parameter. For every model term representing a random effect, let 4 = 07 be 
the variance component for the random effect. Write all the A parameters in the 
last column to the night of the two-way table where each 4 parameter appears on 
the same line as its corresponding model term. The two-way table now appears 
as follows: 


i j k L A 

Qj 0 b Cc n (a) 
2 

B (i) | 1 Cc n Fa) 

Yk b 0 n P(y) 

(ay) ik 0 b 0 n P(ay) 
2 

(BY) jk(i) | I ” ‘i By(a) 

€ (ijk) 1 1 1 n a? 


Rule 7. The expected mean square corresponding to any model term 1s ob- 
tained as a linear combination of 4 parameters as determined by Rule 6 with 
the coefficients determined as follows: 


(i) The coefficient of the 1 parameter is zero if the subscript(s) of the model 
term in that row (whether in parentheses or not) do not include all of 
the subscripts (including those in parentheses) in the suffix of the model 
term whose expected mean square is being evaluated. 

(11) The coefficients of the A parameters that are not defined as zero by 
Rule 7(1) are determined by first deleting the columns corresponding to 
the subscript(s) not in parentheses of the model term whose expected 
mean square 1s being evaluated and then multiplying the entries of the 
remaining columns. 
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For the example being considered, the coefficients of the A parameters for 
different model terms are given as follows: 


Expected Mean Square of 


A B(A) C AC BC(A) Error 


r [oj] [By] tye) Movil (By jqinl  leeijn] 
(a) bn 0 0 0 0 0 
of, a cn cn n 0 0 0 
(7) 0 0 abn 0 0 0 
P(a’y) 0 0 0 bn 0 0 
on a) 0 0 0 n n 0 
a2 1 1 1 1 1 1 


Finally, from the preceding table, the expected mean squares are given by 


E(MS4) = 0, + cnopiqy + bn O(a), 
E(MSa,a)) = a; te CNO Bq); 
E(MSc) = 0, + Noga) + abn ®y), 
E(MSac) = 07 + nogyq) + bn Pay), 
E(MSzcva)) = o; + NOB, (a)> 


and 


E(MS;z) = 2. 


V SAMPLES AND SAMPLING DISTRIBUTION 


The major objective of any statistical analysis is to make inferences about the 
parameters of the population(s) under study. If the population is finite and con- 
tains only a small number of items or individuals, then it would be ideal to 
include every member of the population to record or examine the characteris- 
tic(s) of interest. However, most populations of interest are either infinite or 
too large so that it is not feasible in terms of time and money to include every 
member of the population in the study. Hence, in order to study such a popu- 
lation, the investigator carefully draws a sample, which is much smaller than 
the population, to examine its properties and then generalizes the results of the 
sample to the population of interest. The process of generalizing the results of 
a sample to the population is called statistical inference. 

The basic requirement of a sample is that it should be representative of 
the population under study. However in general, it is difficult to obtain a 
representative sample. The usual procedure is to select a sample that 1s random. 


596 The Analysis of Variance 


The concept of randomness is intended to ensure that individual biases do not 
influence the selection of sample values. In addition, the randomness makes 
it possible to apply the laws of probability in drawing statistical inferences. A 
random sample is usually drawn with the help of a mechanical process such as 
throwing a coin or spinning a roulette wheel. The mechanical process generally 
used to obtain a random sample involves the use of a table of random numbers 
(see Statistical Tables XXIV). In addition, a great variety of computer programs 
exists for obtaining a random sample. The standard procedures for obtaining a 
random sample from a finite population using random numbers are discussed in 
most introductory statistics textbooks (see, e.g., Snedecor and Cochran (1989, 
Section 1.9)) and are not described here. 

For the purpose of statistical inference discussed in appendix W, we assume 


that we have a random sample (x1, x2, ..., X,) of a given size n where x; is the 
observed value of a certain characteristic X on the i-th member of the sample. 
We then calculate some function T(x), x2, ..., X,) of the random sample, called 


a Statistic. We repeat the procedure for every possible samples of size n that can 
be drawn from the population. Now, the successive samples will differ from 
one another and will lead to different values of the statistic 7. Using a random 
mechanism, we can draw repeated samples, calculate the value of T for each 
sample, and derive a frequency distribution of the statistic T. Such a distribution 
of T is known as the sampling distribution of T. The following figure shows a 
schematic representation of the sampling distribution. 


Ist sample 


2nd sample 


etc. 


Parent Population: Distribution of X Sampling Distribution of T 


W METHODS OF STATISTICAL INFERENCE 


The objective in an analysis of variance procedure is to make statistical infer- 
ences about the unknown parameters of the linear model. The main procedures 
for making inferences are hypothesis testing and point and interval estimation. 
In this appendix, we briefly summarize basic concepts of each procedure. 
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HYPOTHESIS TESTING 


In hypothesis testing the investigator is interested in a particular value of an 
unknown parameter and wants to employ a statistical test to determine whether 
the data are consistent with the hypothesized value. The particular hypothesis to 
be tested is referred to as the null hypothesis and is denoted by Hp. In addition, 
there is another hypothesis which is a complement of the null hypothesis to be 
concluded if the null hypothesis is found to be false. The complement of the 
null hypothesis is referred to as the alternative hypothesis and is denoted by 
Ay. 

Working provisionally on the assumption that Hp is true, a test statistic 1s 
calculated from the data, as an index or measure, which is sensitive to depar- 
tures from Ho. Extreme values of the test statistic are unlikely to occur if Ho 
is true and consequently lead to its rejection as a statement of the true value of 
the parameter. 

A statistical test cannot prove that a hypothesis is true or false. Even when Ho 
is true, sampling variation can produce a very large or very small value of the test 
statistic and the investigator may be tricked into rejecting a true null hypothesis. 
The act of rejecting a true null hypothesis is called a type J error. The probability 
of making a type I error is termed the significance level and is denoted by a. 
Similarly, even when Hp is false (H; is true), sampling variation can produce a 
very small value of the test statistic and the investigator may be tricked into not 
rejecting (accepting) a false null hypothesis. The act of not rejecting a false null 
hypothesis is called a type II error. The probability of making a type II error is 
denoted by 6 and 1 — B is termed as the power of the test. 

The critical region of a test statistic is made up of extreme values of the test 
statistic such that Hp is rejected if the test statistic falls in the critical region. 
The boundaries of a critical region are determined such that the probability of 
rejecting a given null hypothesis is just equal to the chosen level of significance. 
For a given value of a, the boundary values of a critical region are called criti- 
cal values. The value of the level of significance is entirely optional although 
a-values of 0.05 and 0.01 are frequently used. The p-value is defined as the prob- 
ability of obtaining a value of the test statistic which is more extreme (greater 
or smaller) than the value calculated from the sample data. The p-value being 
a probability ranges between O and 1. If the p-value is very small, we prefer 
alternative hypothesis Hj; that is, we reject Ho in favor of H,. Conversely, if 
the p-value is large, we naturally prefer the null hypothesis; that is, we do not 
reject (accept) Hp. Note that a is the maximum p-value at which we decide to 
reject Ao. 

The steps in hypothesis testing can be summarized as follows: 


(1) The hypothesis under consideration is formulated by specifying the null 
and alternative hypotheses. 

(2) A value of the level of significance (@) 1s chosen in advance. The most 
common values of a are 0.05 and 0.01. 
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(3) The test statistic for the problem is selected and its value for the sample 
data is calculated. 

(4) The sampling distribution of the test statistic under the assumption 
of the parent distribution of the study population is determined. The 
most common sampling distributions of a test statistic are t, x”, and F 
distributions. 

(5) The critical value(s) corresponding to the chosen value of @ in step 2 1s 
(are) determined from the theoretical values of the sampling distribution 
of the test statistic identified in step 4 and the associated critical region 
is defined. 

(6) The null hypothesis Ho is rejected or accepted depending upon whether 
the value of the statistic calculated in step 3 falls inside or outside the 
critical region. 

(7) The p-value is calculated and reported. 


In a more formal statistical procedure, a sample size is chosen 1n advance that 
guarantees an acceptably high statistical power (e.g., 1 — B = 0.90) of rejecting 
Hp at a given level of significance (e.g., a = 0.05). 

A statistical test is called exact if its level of significance is exactly equal 
to a given value of a. Often, it is not possible to obtain a test with a level of 
significance exactly equal to a, and then the test is referred to as an approximate 
test. An approximate test with the level of significance less than or equal to 
a is called a conservative test. Similarly, an approximate test with the level 
of significance greater than or equal to @ is called a liberal test. In general, 
conservative tests are often preferred when only approximate tests are available. 
However, if it is known that the actual level of significance of a liberal test is 
not much greater than a, the liberal test can be recommended. 


POINT ESTIMATION 


In point estimation of a parameter, a selected function of the sample values, 
known as an estimator, 1s used to make the best guess we can concerning the 
unknown value of the parameter. The idea of the “best” guess is that the esti- 
mator yields a sample value which in some sense is close to the value of the 
unknown parameter. The observed numerical value obtained by using an esti- 
mator for a given sample is called an estimate. Since an estimate will assume 
different values for different samples; it will be close to the true parameter 
value for some samples and will be far from the parameter value for other 
samples. 

Statistical theory uses various criteria to judge the “goodness” or “merit” of 
an estimator. One desirable criterion or property used for this purpose is that 
of unbiasedness. An estimator is said to be unbiased if its average or expected 
value is equal to the parameter being estimated. More precisely, an estimator 
6, of a parameter 6 is unbiased if 


E(6,,) = 0. 
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An example of an unbiased estimator is the sample variance defined by 


ee — xy’ 
52 = i=! 


? 


n—1 


which is an unbiased estimator of the population variance o”. To see that S? 
is an unbiased estimator of 07, one can write down every possible sample of 
size n which could be selected from a population, and compute S? for each 
sample. If we calculate the average value of S*’s from all possible samples, we 
would get o*. Obviously, one cannot enumerate every possible sample when 
the population is infinitely large, but one can derive the property of an unbiased 
estimator from the sampling distribution of the estimator. 

Another desirable property of an estimator is that of consistency. An estimator 
is said to be consistent if it approximates more closely the true parameter value 
with increasing sample size. More precisely, an estimator 9, is a consistent 
estimator of 0 if for any positive real number € 


lim P(\6, —0| >€)=0. 
n—oOo 


A further desirable property of an estimator is that of efficiency. An estimator 
is said to efficient if it has minimum variance! in the class of all unbiased esti- 
mators. A minimum variance unbiased (MVU) estimator is frequently referred 
to as the “bes?” estimator. 


INTERVAL ESTIMATION 


In many studies it is generally not enough to obtain just a single value as an 
estimate for the unknown parameter. It is generally required to specify an index 
or measure of reliability or uncertainty associated with the estimate. A point 
estimate of a parameter provides no such information. A method of estimation 
known as interval estimation does provide this kind of information. In an interval 
estimation of an unknown parameter @, an interval with endpoints 6, and 6y is 
constructed such that 


P(6, <0 < 6y)=1—-—a. (W.1) 


The quantity 1 — @ in equation (W.1) is known as the confidence coefficient or 
the level of confidence. The typical values of a confidence coefficient are 0.99, 
0.95, and 0.90, although other values can also be chosen. 


| The variance of an estimator provides a measure of the sampling error that describes the uncer- 
tainty of inference based on a particular sample. The square root of a variance estimator is called 
the standard error of an estimator. 
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A confidence interval given by equation (W.1) is called exact if the strict 
equality holds. Often, the equality relationship holds only approximately and 
then the interval is referred to as an approximate interval. To emphasize the fact 
that an interval is approximate, equation (W.1) is written as 


P(6, <9 < 6y)=1—a. (W.2) 
An approximate interval (W.2) is called conservative if 

PO, <0 <6y)>1-«a. (W.3) 
Similarly, an approximate interval (W.2) is called liberal if 

P(6, <0 <6y)<1-—«a. (W.4) 


In general, conservative intervals are preferred when only approximate intervals 
are available. However, if it is known that the actual confidence coefficient of 
a liberal interval is not much lower than 1 — a@, the liberal interval can be 
recommended. 

The interval given by equation (W.1) is called a two-sided confidence interval 
because it has both lower and upper endpoints. In many situations an investigator 
is interested in an interval with only one endpoint. An interval with only one 
endpoint is referred to as a one-sided interval. A one-sided interval that satisfies 
the equation 


PO, <9 <w)=1-a. 


is called an upper confidence interval. Similarly, an interval that satifies the 
equation 


P(—0o0 < 0 < 6y)=1-a. 


is called alower confidence interval. In this volume, we only consider two-sided 
confidence intervals. However, one-sided intervals can be readily obtained from 
two-sided intervals with only a minor modification. 

As with estimators statistical theory uses several criteria to judge the good- 
ness or merit of an interval. Again, a desirable criterion or property of an 
interval is that of unbiasedness. A confidence interval is said to be unbiased 
if the probability of containing any value not equal to the true of value of @ 1s 
less than or equal to 1 — a. Another desirable property of an interval 1s that of 
uniformly most accurate (UMA). A confidence interval is said to be uniformly 
most accurate if the interval has a smaller probability of containing a value 
not equal to 9 than any other interval with confidence coefficient 1 — a. A 
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further desirable property of an interval is that of uniformly most accurate un- 
biased (UMAU). A confidence interval that is uniformly most accurate within 
the class of all unbiased confidence intervals is called a uniformly most ac- 
curate unbiased confidence interval. A final desirable property of an interval 
is that of uniformly shortest length (USL). A confidence interval is said to be 
uniformly shortest length if it has shorter than or shortest expected length of 
any other interval with confidence coefficient 1 — a. Generally, if a two-sided 
confidence interval is UMA (UMAU), then the expected length is shortest 
within the class of all (unbiased) confidence intervals. For a detailed and rig- 
orous discussion of the properties of confidence intervals, see Graybill (1976, 

Section 2.9). 


X SOME SELECTED LATIN SQUARES 


This appendix contains some more representations of Latin Squares from 3 x 3 
to 12 x 12 


3x3 4x4 
1 2 3 4 
ABC ABCD ABCD ABCD ABCD 
BCA BADC BCDA BDAC BADC 
CAB CDBA CDAB CADB CDAB 
DCAB DABC DCBA DCBA 
5x5 6x6 7X7 
ABCDE ABCDEF ABCDEFG 
BAECD BFDCAE BCDEFGA 
CDAEB CDEFBA CDEFGAB 
DEBAC DAFECB DEFGABC 
ECDBA ECABFD EFGABCD 
FEBADC FGABCDE 
GABCDEF 
8 x 8 9x9 10x10 
ABCDEFGH ABCDEFGHI ABCDEFGHIJ 
BCDEFGHA BCDEFGHIA BCDEFGHIJA 
CDEFGHAB CDEFGHIAB CDEFGHIJAB 
DEFGHABC DEFGHIABC DEFGHIJABC 
EFGHABCD EFGHIABCD EFGHIJABCD 
FGHABCDE FGHIABCDE FGHIJABCDE 
GHABCDEF GHIABCDEF GHIJABCDEF 
HABCDEFG HIABCDEFG HIJABCDEFG 
IABCDEFGH IJABCDEFGH 


JABCDEFGHI 
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Wx 11 


ABCDEFGHIJK 
BCDEFGHIJKA 
CDEFGHIJKAB 
DEFGHIJKABC 
EFGHIJKABCD 
FGHIJKABCDE 
GHIJKABCDEF 
HIJKABCDEFG 
IJKABCDEFGH 
JKABCDEFGHI 
KABCDEFGHIJ 
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12 x 12 


ABCDEFGAHIJKL 
BCDEFGHIJKLA 
CDEFGHIJKLAB 
DEFGHIJKLABC 
EFGHIJKLABCD 
FGHIJKLABCDE 
GHIJKLABCDEF 
HIJKLABCDEFG 
IJKLABCDEFGH 
JKLABCDEFGHI 
KLABCDEFGHIJ 
LABCDEFGHIJK 


Source: Cochran and Cox (1957, pp. 145-146). Used with permission. 


Y SOME SELECTED GRAECO-LATIN SQUARES 


This appendix contains some more representations of Graeco-Latin squares 
from 7 x 7 to 12 x 12. 


7X7 8x8 
Aa Be Cg De Ey Fy Gs Aw Be Cg Dy E, Fs Go He 
Ba Ce Dy Ey, Fs Ga Ae By Ap Ga Fy, Ay De Ce Es 
Cy Dy, Es Fa Ge Ap Be Cy Gs An Ev Dg He Be Fo 
Ds Ey Fe Gg Age By Cy Ds Fy Ez Ae Co Ba Hy, Gg 
E, Fg Ge Ay By, Cs Da E; Hy Do Cs At Gy Fp By, 
Fe Gy, An Bs Ca De Eg Fz D, Hs Bo Ge Ap Ey Ca 
G, As Ba C, Dp Ex Fy, G, Ce B, Hp Fa Eg As Dz 
Hp Eg Fe Ge Bs Cy Da Ay 
9x9 
Ag By, Cp D, E. Fg Gs Hy I, 
Bg Ca Ay Eg F, D, A, Is G¢ 
Cy Ag By iF Do E, IE: G; Hs 
Ds E¢ F, Gy Ay Ig Ay B, Ca 
ae F; Dg Ap le Gy Bg Cy Ar 
Fr D-, Es L, Gp Aa C. Ag B, 
G, H, Ig Ags B: Ci De Ey Fg 
Hg I, Gr B, Cs Ag Ep Fy Dy 
I, Go A, C; Ag Bs Fy, Dg Ei 
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K, Le Is. Ja Gy He Eo Fe Cy Dg An Br 
Lu Ke Jy Ip Hs Gr Fy Eg Do Ca Br Ax 


Source: Cochran and Cox (1957, pp. 146-147). Used with permission. 


Z PROC MIXED OUTPUTS FOR SOME SELECTED 
WORKED EXAMPLES 


In this appendix, we include some additional outputs using SAS PROC MIXED 
for some selected worked examples given in Sections 4.16, 7.9, and 10.6. The 
outputs for these examples using PROC GLM were included in Figures 4.4, 
7.4, and 10.9. We did not include these outputs there because the methodology 
underlying these analyses has not been discussed in this volume. We hope 
that the readers with adequate background will find these results interesting 
and useful because they contain estimates of variance components using the 
maximum likelihood (ML) and the restricted maximum likelihood (REML) 
procedures. It should be remarked that for balanced designs when the analysis 
of variance estimates of variance components are nonnegative, they are identical 
to the REML estimates given here. However, the results of the F' tests for the 
fixed effects are generally not equivalent to significance tests produced by the 
PROC MIXED procedure. 
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DATA YIELDLOD; 


INPUT AGING MIX YIELD; 
DATALINES; 

Al 1 574 

11 564 

#1 1 550 

f1 2 524 

2 3 1055; 

PROC MIXED; 

CLASSES AGING MIX; 
MODEL YIELD=AGING; 
RANDOM MIX AGING*MIX; 
RUN; 

Class Level Information 
Class Levels Values 
AGING 2 12 
MIX 3 12 3 


| DATA GLYCOGEN; 
J INPUT TREATMNT $ RAT 


| PROC MIXED; 

CLASS TREATMNT RAT 
PREPARAT;MODEL GLY= 
TREAT; RANDOM RAT 
(TREAT) PREPARAT (RAT 
TREAT) ; RUN; 


1Class Level Information f 


Values 
C217 


Levels 


[DATA SOYBEAN; 

J INPUT BLOCK SPACING 
ISVARIETY $YIELD; 

| DATALINES; 

1 18" OM 33.6 


1CLASS BLOCK VARIETY 
SPACING;MODEL YIELD= 
VARIETY SPACING VARIETY* 
i SPACING; RANDOM BLOCK 

1 VARIETY * BLOCK , 
SPACING* BLOCK; RUN; 

Class Level Information 
#Class Levels Values 
PBLOCK 6123456 
|VARIETY 2 B OM 

FSPACING 5S 18 24 30 36 42 


Iter. 


REML Estimation Iteration History Model Fitting Information 


Iter. 


Convergence criteria met. 


Covariance Parameter Estimates (REML) 


Cov Parm 


| AGING*MIX 
| Residual 


0 1 
1 1 


Eval. 
0 1 
1 1 


Eval. 


RConvergence criteria met 


I RAT (TREATMNT) 
] PREPARAT (TREATMNT* RAT) 
i Residual 
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Ouput for the Worked Example in Section 4.16 
The SAS System 
The MIXED Procedure 


for YIELD 


Value 
18.0000 
~76.1951 
~79.1951 
-80.3540 
152.3902 


Description 
Observations 
Res Log Likelihood 
Akaike's Inf. Crit. 
Schwarz's Bayes. Crit. 
-2 Res Log Likelihood 


Objetive Criterion 
124,13401764 


122.98420101 0.0000000 


Tests of Fixed Effects 
NDF DDF Type III F Pr>F 
1965.80 0.0005 


Estimate Source 
182.59259259 1 2 
10.31481481 
§32.22222222 


Ouput for the Worked Example in Section 7.9 
The SAS System 
The MIXED Procedure 


REML Estimation Iteration History Model Fitting Information for GLYCOGEN 


Value 
36.0000 


Description 
Observations 

Res Log Likelihood -109.811 
Akaike's Inf. Crit. -112.811 
Schwarz's Bayes Crit. -115.055 
-2 Res Log Likelihood 219.6213 


Objective Criterion 
171.91789976 


158. 97132463 0.00000000 


Tests of Fixed Effects 
NDF DDF Type III F Pr > F 
2.93 0.1971 


Estimate Source 
36.06481481 TREATMNT 2 3 
14.16666667 
21.16666667 


Ouput for the Worked Example in Section 10.6 
The SAS System 
The MIXED Procedure 


| REML Estimation Iteration History 


I Iter. 


Eval. 


1 
3 
2 
1 


Objective 
146.43893356 
146.27969658 
146.27230705 
146.27218808 


fConvergence criteria met. 


#Covariance Parameter Estimates (REML) 
Source 


I Cov Parm 


Estimate 


Criterion 


0.00011090 
0.00000161 
0.00000000 


Model Fitting Information 


Description 
Observations 


Res Log Likelihood 


Akaike's Inf. Crit. 
Schwarz's Bayes. Crit. 
-2 Res Log Likelihood 


for YIELD 


Value § 

60.0000 
-119.083 
-123.083 
-126.907 
238.1660 


Tests of Fixed Effects 


NDF DDF Type III F Pr>F 
102.33 0.0002 
11.04 0.0001 

1.36 0.2820 


VARIETY 1 5 
SPACING 4 20 
VAR*SPA 4 20 


0.14028209 
0.00000000 
0.00000000 
4.66840543 


(iii) SAS PROC MIXED Instructions and Output for the Worked Example in Section 10.6 


Statistical Tables and Charts 


Table I. Cumulative Standard Normal Distribution 


This table gives the area under the standard normal curve from — oo to the indi- 
cated values of z. The values of z are provided from 0.00 to 3.99 in increments 
of 0.01 units. 


Examples: (i) P(Z < 1.47) = 0.9292. 
(Gi) P(Z>2.12) = 1— P(Z <2.12) = 1 — 0.9830 = 0.0170. 
Gil) P(Z < —2.51) = 1— P(Z <2.51) = 1 — 0.9940 = 0.0060. 
(iv) P(—1.21 < Z <2.68) = P(Z < 2.68) — P(Z < —1.21) = 
0.8832. 
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z P(Z<2 z P(Z<2D z P(IZ<D z P(Z<2 z P(IZ<BH z P(Z<2 z P(Z<2zZ z P(Z<2) 


0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987 3.50 0.9998 
0.01 0.5040 0.51 0.6950 1.01 0.8438 1.51 0.9345 2.01 0.9778 2.51 0.9940 3.01 0.9987 3.51 0.9998 
0.02 0.5080 0.52 0.6985 1.02 0.8461 1.52 0.9357 2.02 0.9783 2.52 0.9941 3.02 0.9987 3.52 0.9998 
0.03 0.5120 0.53 0.7019 1.03 0.8485 1.53 0.9370 2.03 0.9788 2.53 0.9943 3.03 0.9988 3.53 0.9998 
0.04 0.5160 0.54 0.7054 1.04 0.8508 1.54 0.9382 2.04 0.9793 2.54 0.9945 3.04 0.9988 3.54 0.9998 


0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946 3.05 0.9989 3.55 0.9998 
0.06 0.5239 0.56 0.7123 1.06 0.8554 1.56 0.9406 2.06 0.9803 2.56 0.9948 3.06 0.9989 3.56 0.9998 
0.07 0.5279 0.57 0.7157 1.07 0.8577 1.57 0.9418 2.07 0.9808 2.57 0.9949 3.07 0.9989 3.57 0.9998 
0.08 0.5319 0.58 0.7190 1.08 0.8599 1.58 0.9429 2.08 0.9812 2.58 0.9951 3.08 0.9990 3.58 0.9998 
0.09 0.5359 0.59 0.7224 1.09 0.8621 1.59 0.9441 2.09 0.9817 2.59 0.9952 3.09 0.9990 3.59 0.9998 


0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953 3.10 0.9990 3.60 0.9998 
0.11 0.5438 0.61 0.7291 1.11 0.8665 1.61 0.9463 2.11 0.9826 2.61 0.9955 3.11 0.9991 3.61 0.9998 
0.12 0.5478 0.62 0.7324 1.12 0.8686 1.62 0.9474 2.12 0.9830 2.62 0.9956 3.12 0.9991 3.62 0.9999 
0.13 0.5517 0.63 0.7357 1.13 0.8708 1.63 0.9484 2.13 0.9834 2.63 0.9957 3.13 0.9991 3.63 0.9999 
0.14 0.5557 0.64 0.7389 1.14 0.8729 1.64 0.9495 2.14 0.9838 2.64 0.9959 3.14 0.9992 3.64 0.9999 


0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960 3.15 0.9992 3.65 0.9999 
0.16 0.5636 0.66 0.7454 1.16 0.8770 1.66 0.9515 2.16 0.9846 2.66 0.9961 3.16 0.9992 3.66 0.9999 
0.17 0.5675 0.67 0.7486 1.17 0.8790 1.67 0.9525 2.17 0.9850 2.67 0.9962 3.17 0.9992 3.67 0.9999 
0.18 0.5714 0.68 0.7517 1.18 0.8810 1.68 0.9535 2.18 0.9854 2.68 0.9963 3.18 0.9993 3.68 0.9999 
0.19 0.5753 0.69 0.7549 1.19 0.8830 1.69 0.9545 2.19 0.9857 2.69 0.9964 3.19 0.9993 3.69 0.9999 


0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965 3.20 0.9993 3.70 0.9999 
0.21 0.5832 0.71 0.7611 1.21 0.8869 1.71 0.9564 2.21 0.9864 2.71 0.9966 3.21 0.9993 3.71 0.9999 
0.22 0.5871 0.72 0.7642 1.22 0.8888 1.72 0.9573 2.22 0.9868 2.72 0.9967 3.22 0.9994 3.72 0.9999 
0.23 0.5910 0.73 0.7673 1.23 0.8907 1.73 0.9582 2.23 0.9871 2.73 0.9968 3.23 0.9994 3.73 0.9999 
0.24 0.5948 0.74 0.7703 1.24 0.8925 1.74 0.9591 2.24 0.9875 2.74 0.9969 3.24 0.9994 3.74 0.9999 


0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970 3.25 0.9994 3.75 0.9999 
0.26 0.6026 0.76 0.7764 1.26 0.8962 1.76 0.9608 2.26 0.9881 2.76 0.9971 3.26 0.9994 3.76 0.9999 
0.27 0.6064 0.77 0.7794 1.27 0.8980 1.77 0.9616 2.27 0.9884 2.77 0.9972 3.27 0.9995 3.77 0.9999 
0.28 0.6103 0.78 0.7823 1.28 0.8997 1.78 0.9625 2.28 0.9887 2.78 0.9973 3.28 0.9995 3.78 0.9999 
0.29 0.6141 0.79 0.7852 1.29 0.9015 1.79 0.9633 2.29 0.9890 2.79 0.9974 3.29 0.9995 3.79 0.9999 


0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974 3.30 0.9995 3.80 0.9999 
0.31 0.6217 0.81 0.7910 1.31 0.9049 1.81 0.9649 2.31 0.9896 2.81 0.9975 3.31 0.9995 3.81 0.9999 
0.32 0.6255 0.82 0.7939 1.32 0.9066 1.82 0.9656 2.32 0.9898 2.82 0.9976 3.32 0.9995 3.82 0.9999 
0.33 0.6293 0.83 0.7967 1.33 0.9082 1.83 0.9664 2.33 0.9901 2.83 0.9977 3.33 0.9996 3.83 0.9999 
0.34 0.6331 0.84 0.7995 1.34 0.9099 1.84 0.9671 2.34 0.9904 2.84 0.9977 3.34 0.9996 3.84 0.9999 


0.35 0.6368 0.85 0.8023 1.35 O.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978 3.35 0.9996 3.85 0.9999 
0.36 0.6406 0.86 0.8051 1.36 0.9131 1.86 0.9686 2.36 0.9909 2.86 0.9979 3.36 0.9996 3.86 0.9999 
0.37 0.6443 0.87 0.8078 1.37 0.9147 1.87 0.9693 2.37 0.9911 2.87 0.9979 3.37 0.9996 3.87 1.0000 
0.38 0.6480 0.88 0.8106 1.38 0.9162 1.88 0.9699 2.38 0.9913 2.88 0.9980 3.38 0.9996 3.88 1.0000 
0.39 0.6517 0.89 0.8133 1.39 0.9177 1.89 0.9706 2.39 0.9916 2.89 0.9981 3.39 0.9997 3.89 1.0000 


0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981 3.40 0.9997 3.90 1.0000 
0.41 0.6591 0.91 0.8186 1.41 0.9207 1.91 0.9719 2.41 0.9920 2.91 0.9982 3.41 0.9997 3.91 1.0000 
0.42 0.6628 0.92 0.8212 1.42 0.9222 1.92 0.9726 2.42 0.9922 2.92 0.9982 3.42 0.9997 3.92 1.0000 
0.43 0.6664 0.93 0.8238 1.43 0.9236 1.93 0.9732 2.43 0.9925 2.93 0.9983 3.43 0.9997 3.93 1.0000 
0.44 0.6700 0.94 0.8264 1.44 0.9251 1.94 0.9738 2.44 0.9927 2.94 0.9984 3.44 0.9997 3.94 1.0000 


0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984 3.45 0.9997 3.95 1.0000 
0.46 0.6772 0.96 0.8315 1.46 0.9279 1.96 0.9750 2.46 0.9931 2.96 0.9985 3.46 0.9997 3.96 1.0000 
0.47 0.6808 0.97 0.8340 1.47 0.9292 1.97 0.9756 2.47 0.9932 2.97 0.9985 3.47 0.9997 3.97 1.0000 
0.48 0.6844 0.98 0.8365 1.48 0.9306 1.98 0.9761 2.48 0.9934 2.98 0.9986 3.48 0.9997 3.98 1.0000 
0.49 0.6879 0.99 0.8389 1.49 0.9319 1.99 0.9767 2.49 0.9936 2.99 0.9986 3.49 0.9998 3.99 1.0000 


Computed Using IMSL* Library Functions. 


* IMSL (International Mathematical and Statistical Library) is a registered trade mark of IMSL, 
Inc. 
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Table Il. Percentage Points of the Standard Normal Distribution 


This table is the inverse of Table I. The entries in the table give z values (per- 
centiles) corresponding to a given cumulative probability (1.e., P(Z < 2Z)), 
which represents all the area to the left of the z value. The values of P(Z < z) 
are given from 0.0001 to 0.9999. 


Examples: (i) P(Z < z) = 0.005, z = —2.57583. 
Gi) P(Z < z) = 0.200, z = —0.84162. 
(ii) P(Z < z) =0.800,z= 0.84162. 
(iv) P(Z < z)=0.950,z= 1.64485. 


P(Z< 2) z P(Z< 2) z P(Z< 2) z P(Z< 2) z P(Z<z) z 

0.0001 ~ 3.71902 0.165 —0.97411 0.390 —0.27932 0.615 0.29237 0.8400 0.99446 
0.0002 —3.54008 0.170 —0.95417 0.395 —0.26631 0.620 0.30548 0.8450 1.01522 
0.0003 —3.43161 0.175 —0.93459 0.400 —0.25335 0.625 0.31864 0.8500 1.03643 
0.0004 —3.35279 0.180 —0.91537 0.405 —0.24043 0.630 0.33185 0.8550 1.05812 
0.0005 — 3.29053 0.185 —0.89647 0.410 —0.22754 0.635 0.34513 0.8600 1.08032 
0.0010 — 3.09023 0.190 —0.87790 0.415 —0.21470 0.640 0.35846 0.8650 1.10306 
0.0020 — 2.87816 0.195 —0.85962 0.420 —0.20189 0.645 0.37186 0.8700 1.12639 
0.0030 —2.74778 0.200 —0.84162 0.425 —0.18912 0.650 0.38532 0.8750 1.15035 
0.0040 —2.65207 0.205 —0.82389 0.430 —0.17637 0.655 0.39886 0.8800 1.17499 
0.0050 —2.57583 0.210 —0.80642 0.435 —0.16366 0.660 0.41246 0.8850 1.20036 
0.0060 —2.51214 0.215 —0.78919 0.440 —0.15097 0.665 0.42615 0.8900 1.22653 
0.0070 ~2.45726 0.220 —Q.77219 0.445 —0.13830 0.670 0.43991 0.8950 1.25357 
0.0080 — 2.40892 0.225 ~0.75542 0.450 —0.12566 0.675 0.45376 0.9000 1.28155 
0.0090 — 2.36562 0.230 —0.73885 0.455 —0.11304 0.680 0.46770 0.9050 1.31058 
0.0100 — 2.32635 0.235 —0.72248 0.460 —0.10043 0.685 0.48173 0.9100 1.34076 
0.0150 —2.17009 0.240 —0.70630 0.465 ~0.08784 0.690 0.49585 0.9150 1.37220 
0.0200 — 2.05375 0.245 —0.69031 0.470 —0.07527 0.695 0.51007 0.9200 1.40507 
0.0250 — 1.95996 0.250 —0.67449 0.475 —0.06271 0.700 0.52440 0.9250 1.43953 
0.0300 — 1.88079 0.255 —0.65884 0.480 —0.05015 0.705 0.53884 0.9300 1.47579 
0.0350 —1.81191 0.260 —0.64335 0.485 —0.03761 0.710 0.55338 0.9350 1.51410 
0.0400 — 1.75069 0.265 —0.62801 0.490 —0.02507 0.715 0.56805 0.9400 1.55477 
0.0450 — 1.69540 0.270 —0.61281 0.495 —0.01253 0.720 0.58284 0.9450 1.59819 
0.0500 — 1.64485 0.275 —0.59776 0.500 0.00000 0.725 0.59776 0.9500 1.64485 
0.0550 — 1.59819 0.280 —0.58284 0.505 0.01253 0.730 0.61281 0.9550 1.69540 
0.0600 — 1.55477 0.285 —0.56805 0.510 0.02507 0.735 0.62801 0.9600 1.75069 
0.0650 —1.51410 0.290 —0.55338 0.515 0.03761 0.740 0.64335 0.9650 1.81191 
0.0700 — 1.47579 0.295 —0.53884 0.520 0.05015 0.745 0.65884 0.9700 1.88079 
0.0750 — 1.43953 0.300 ~0.52440 0.525 0.0627 1 0.750 0.67449 0.9750 1.95996 
0.0800 — 1.40507 0.305 —0.51007 0.530 0.07527 0.755 0.6903 1 0.9800 2.05375 
0.0850 — 1.37220 0.310 —0.49585 0.535 0.08784 0.760 0.70630 0.9850 2.17009 
0.0900 — 1.34076 0.315 —0.48173 0.540 0.10043 0.765 0.72248 0.9900 2.32635 
0.0950 — 1.31058 0.320 —0.46770 0.545 0.11304 0.770 0.73885 0.9910 2.36562 
0.1000 — 1.28155 0.325 —0.45376 0.550 0.12566 0.775 0.75542 0.9920 2.40892 
0.1050 — 1.25357 0.330 —0.43991 0.555 0.13830 0.780 0.77219 0.9930 2.45726 
0.1100 — 1.22653 0.335 —0.42615 0.560 0.15097 0.785 0.78919 0.9940 2.51214 
0.1150 — 1.20036 0.340 —0.41246 0.565 0.16366 0.790 0.80642 0.9950 2.57583 
0.1200 — 1.17499 0.345 —0.39886 0.570 0.17637 0.795 0.82389 0.9960 2.65207 
0.1250 — 1.15035 0.350 —0.38532 0.575 0.18912 0.800 0.84162 0.9970 2.74778 
0.1300 — 1.12639 0.355 —0.37186 0.580 0.20189 0.805 0.85962 0.9980 2.87816 
0.1350 — 1.10306 0.360 ~0.35846 0.585 0.21470 0.810 0.87790 0.9990 3.09023 
0.1400 — 1.08032 0.365 —0.34513 0.590 0.22754 0.815 0.89647 0.9995 3.29053 
0.1450 — 1.05812 0.370 —0.33185 0.595 0.24043 0.820 0.91537 0.9996 3.35279 
0.1500 — 1.03643 0.375 —0.31864 0.600 0.25335 0.825 0.93459 0.9997 3.43161 
0.1550 — 1.01522 0.380 —0.30548 0.605 0.26631 0.830 0.95417 0.9998 3.54008 
0.1600 —0.99446 0.385 —0.29237 0.610 0.27932 0.835 0.97411 0.9999 3.71902 


Computed Using IMSL* Library Functions. 


* IMSL (International Mathematical and Statistical Library) is a registered trade mark of IMSL, 
Inc. 
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Table III. Critical Values of the Student’s t Distribution 


This table gives the critical values of the Student’s ¢ distribution for degrees of 
freedom v = 1 (1) 30, 40, 60, 120, co. The a-values are given corresponding 
to upper-tail tests of significance. The critical values are given corresponding to 
one-tail a-levels equal to 0.40, 0.30, 0.20, 0.15, 0.10, 0.025, 0.02, 0.015, 0.01, 
0.0075, 0.005, 0.0025, and 0.0005. Since the distribution of ¢ is symmetrical 
about zero, the one-tailed significance level of w corresponds to the two-tailed 
significance level of 2a. All the critical values are provided to three decimal 
places. 


tly, |-c] 


Examples: (i) For v= 15, a =0.01, the desired critical value from the table 
is t{15, 0.99] = 2.602. 

(ii) For v = 60, a =0.05, the desired critical value from the table 
is t{60, 0.95] = 1.671. 
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is 


A f& WN — 


From J. Neter, M. H. Kutner, C. J. Nachtsheim 


0.40 0.30 0.20 


0.325 0.727 1.376 
0.289 0.617 1.061 
0.277 0.584 0.978 
0.271 0.569 0.941 
0.267 0.559 0.920 


0.265 0.553 0.906 
0.263 0.549 0.896 
0.262 0.546 0.889 
0.261 0.543 0.883 
0.260 0.542 0.879 


0.260 0.540 0.876 
0.259 0.539 0.873 
0.259 0.537 0.870 
0.258 0.537 0.868 
0.258 0.536 0.866 


0.258 0.535 0.865 
0.257 0.534 0.863 
0.257 0.534 0.862 
0.257 0.533 0.861 
0.257 0.533 0.860 


0.257 0.532 0.859 
0.256 0.532 0.858 
0.256 0.532 0.858 
0.256 0.531 0.857 
0.256 0.531 0.856 


0.256 0.531 0.856 
0.256 0.531 0.855 
0.256 0.530 0.855 
0.256 0.530 0.854 
0.256 0.530 0.854 


0.255 0.529 0.851 
0.254 0.527 0.848 
0.254 0.526 0.845 
0.253 0.524 0.842 


0.15 


1.963 
1.386 
1.250 
1.190 
1.156 


1.134 
1.119 
1.108 
1.100 
1.093 


1.088 
1,083 
1.079 
1.076 
1.074 


1.071 
1.069 
1.067 
1.066 
1.064 


1.063 
1.061 
1.060 
1.059 
1.058 


1.058 
1.057 
1.056 
1.055 
1.055 


1.050 
1.045 
1.041 
1.036 


0.10 


3.078 
1.886 
1.638 
1.533 
1.476 


1.440 
1.415 
1.397 
1.383 
1.372 


1.363 
1.356 
1.350 
1.345 
1.341 


1.337 
1.333 
1.330 
1.328 
1.325 


1.323 
1.321 
1.319 
1.318 
1.316 


1.315 
1.314 
1.313 
1.311 
1.310 


1.303 
1.296 
1.289 
1.282 


0.05 


6.314 
2.920 
2.353 
2.132 
2.015 


1.943 
1.895 
1.860 
1.833 
1.812 


1.796 
1.782 
1.771 
1.761 
1.753 


1.746 
1.740 
1.734 
1.729 
1.725 


1.721 
1.717 
1.714 
1.711 
1.708 


1.706 
1.703 
1.701 
1.699 
1.697 


1.684 
1.671 
1.658 
1.645 


0.025 


12.706 
4.303 
3.182 
2.776 
pany gl 


2.447 
2.365 
2.306 
2.262 
2.228 


2.201 
2.179 
2.160 
2.145 
2.131 


2.120 
2.110 
2.101 
2.093 
2.086 


2.080 
2.074 
2.069 
2.064 
2.060 


2.056 
2.052 
2.048 
2.045 
2.042 


2.021 
2.000 
1.980 
1.960 


a 
0.02 


15.895 
4.849 
3.482 
2.999 
2.757 


2.612 
2.517 
2.449 
2.398 
2.359 


2.328 
2.303 
2.282 
2.264 
2.249 


2.235 
2.224 
2.214 
2.205 
2.197 


2.189 
2.183 
2.177 
2.172 
2.167 


2.162 
2.158 
2.154 
2.150 
2.147 


2.123 
2.099 
2.076 
2.054 


0.015 


21.205 
5.643 
3.896 
3.298 
3.003 


2.829 
2.715 
2.634 
2.574 
2.527 


2.491 
2.461 
2.436 
2.415 
2.397 


2.382 
2.368 
2.356 
2.346 
2.336 


2.328 
2.320 
2.313 
2.307 
2.301 


2.296 
2.291] 
2.286 
2.282 
2.278 


2.250 
2.229 
2.196 
2.170 


0.01 0.0075 0.005 


31.821 
6.965 
4.541 
3.747 
3.365 


3.143 
2.998 
2.896 
2.821 
2.764 


2.718 
2.681 
2.650 
2.624 
2.602 


2.583 
2.567 
2.552 
2.539 
2.528 


2.518 
2.508 
2.500 
2.492 
2.485 


2.479 
2.473 
2.467 
2.462 
2.457 


2.423 
2.390 
2.358 
2.326 


609 


0.0025 0.0005 


42.434 63.657 127.322 636.590 


8.073 
5.047 
4.088 
3.634 


3.372 
3.203 
3.085 
2.998 
2.932 


2.879 
2.836 
2.801 
2.771 
2.746 


2.724 
2.706 
2.689 
2.674 
2.661 


2.649 
2.639 
2.629 
2.620 
2.612 


2.605 
2.598 
2.592 
2.586 
2.581 


2.542 
2.504 
2.468 
2.432 


9.925 
5.841 
4.604 
4.032 


3.707 
3.499 
3.355 
3.250 
3.169 


3.106 
3.055 
3.012 
2.977 
2.947 


2.921 
2.898 
2.878 
2.861 
2.845 


2.831 
2.819 
2.807 
2.797 
2.787 


2.779 
2.771 
2.763 
2.756 
2.750 


2.704 
2.660 
2.617 
2.576 


14.089 
7.453 
5.598 
4.773 


4.317 
4.029 
3.833 
3.690 
3.581 


3.497 
3.428 
3.372 
3.326 
3.286 


3252 
3.222 
3.197 
3.174 
3.153 


3.135 
3.119 
3.104 
3.091 
3.078 


3.067 
3.057 
3.047 
3.038 
3.030 


2.971 
2.915 
2.860 
2.807 


31.598 
12.924 
8.610 
6.869 


5.959 
5.408 
5.041 
4.781 
4.587 


4.437 
4.318 
4.221 
4.140 
4.073 


4.015 
3.965 
3.922 
3.883 
3.849 


3.819 
3.792 
3.768 
3.745 
3.725 


3.707 
3.690 
3.674 
3.659 
3.646 


3.551 
3.460 
3.373 
3.291 


and W. Wasserman, Applied 
Linear Statistical Models, Fourth Edition, © 1996 by Richard D. Irwin, Inc., 
Chicago. Reprinted by permission (from Table B.2). 
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Table IV. Critical Values of the Chi-Square Distribution 


This table gives the critical values of the chi-square (x *) distribution for degrees 
of freedom v = 1 (1) 30 (10) 100. The critical values are given corresponding 
to a-levels equal to 0.995, 0.990, 0.975, 0.95, 0.90, 0.75, 0.50, 0.25, 0.10, 
0.05, 0.025, 0.01, and 0.005. All the critical values are provided to two decimal 
places. 


x [v, ] -a]} 


Examples: (i) For v = 15,a@ = 0.05, the desired critical value from 
the table is x7[15, 0.95] = 25.00. 
(ii) For v = 20, a = 0.90, the desired critical value from 
the table is x7[20, 0.1] = 12.44. 
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0.04393 0.07157 0.07982 0.07393 


y 0.995 
] 
2 0.01 
3 0.07 
4 0.21 
5 041 
6 0.68 
7 0.99 
8 1.34 
9 = =1.73 
102.16 
11 2.60 
12 3.07 
13. 3.57 
14 4.07 
15 4.60 
16 = 5.14 
17. 5.70 
18 6.26 
19 6.84 
207.43 
21 8.03 
22 8.64 
239.26 
24 9.89 
25 10.52 
26 11.16 
27 «11.81 
28 «12.46 
29 13.12 
30. 13.79 
40 20.71 
50 27.99 
60 35.53 
70 43.28 
80 51.17 
90 59.20 
100 67.33 


0.990 


0.02 
0.11 
0.30 
0.55 


0.87 
1.24 
1.65 
2.09 
2.56 


3.05 
3.57 
4.11 
4.66 
5.23 


5.81 
6.41 
7.01 
7.63 
8.26 


8.90 
9.54 
10.20 
10.86 
11.52 


12.20 
12.88 
13.56 
14.26 
14.95 


22.16 
29.71 
37.48 
45.44 
53.54 
61.75 
70.06 


0.975 


0.05 
0.22 
0.48 
0.83 


1.24 
1.69 
2.18 
2.70 
3.25 


3.82 
4.40 
5.01 
5.63 
6.27 


6.91 
7.56 
8.23 
8.91 
9.59 


10.28 
10.98 
11.69 
12.40 
13.12 


13.84 
14.57 
15.31 
16.05 
16.79 


24.43 
32.36 
40.48 
48.76 
57.15 
65.65 
74.22 


0.950 


0.10 
0.35 
0.71 
1.15 


1.64 
2.17 
2.73 
3.33 
3.94 


4.57 
5.23 
5.89 
6.57 
7.26 


7.96 
8.67 
9.39 
10.12 
10.85 


11.59 
12.34 
13.09 
13.85 
14.61 


15.38 
16.15 
16.93 
17.71 
18.49 


26.51 
34.76 
43.19 
51.74 
60.39 
69.13 
77.93 


0.900 


0.02 
0.21 
0.58 
1.06 
1.61 


2.20 
2.83 
3.49 
4.17 
4.87 


5.58 
6.30 
7.04 
7.79 
8.55 


9.31 
10.09 
10.86 
11.65 
12.44 


13.24 
14.04 
14.85 
15.66 
16.47 


17.29 
18.11 
18.94 
19.77 
20.60 


29.05 
37.69 
46.46 
55.33 
64.28 
73.29 
82.36 


a 
0.750 


0.10 
0.58 
1.21 
1.92 
2.67 


3.45 
4.25 
5.07 
5.90 
6.74 


7.58 
8.44 
9.30 
10.17 
11.04 


11.91 
12.79 
13.68 
14.56 
15.45 


16.34 
17.24 
18.14 
19.04 
19.94 


20.84 
21.75 
22.66 
23.57 
24.48 


33.66 
42.94 
52.29 
61.70 
71.14 
80.62 
90.13 


0.500 0.250 


0.45 
1.39 
237 
3.36 
4.35 


5.35 
6.35 
7.34 
8.34 
9.34 


10.34 
11.34 
12.34 
13.34 
14.34 


15.34 
16.34 
17.34 
18.34 
19.34 


20.34 
21.34 
22.34 
23.34 
24.34 


25.34 
26.34 
27.34 
28.34 
29.34 


39.34 
49.33 
59.33 
69.33 
79.33 
89.33 


1.32 
2.77 
4.11 
5.39 
6.63 


7.84 
9.04 
10.22 
11.39 
12.55 


13.70 
14.85 
15.98 
17.12 
18.25 


19.37 
20.49 
21.60 
22.72 
23.83 


24.93 
26.04 
27.14 
28.24 
29.34 


30.43 
31.53 
32.62 
33.71 
34.80 


45.62 
56.33 
66.98 
77.58 
88.13 
98.64 


99.33 109.14 


0.100 


2.71 
4.61 
6.25 
7.78 
9.24 


10.64 
12.02 
13.36 
14.68 
15.99 


17.28 
18.55 
19.81 
21.06 
22.31 


23.54 
24.77 
25.99 
27.20 
28.41 


29.62 
30.81 
32.01 
33.20 
34.38 


35.56 
36.74 
37.92 
39.09 
40.26 


51.80 
63.17 
74.40 
85.53 
96.58 
107.56 
118.50 


0.050 


3.84 
5.99 
7.81 
9.49 
11.07 


12.59 
14.07 
15.51 
16.92 
18.31 


19.68 
21.03 
22.36 
23.68 
25.00 


26.30 
27.59 
28.87 
30.14 
31.41 


32.67 
33.92 
35.17 
36.42 
37.65 


38.89 
40.11 
41.34 
42.56 
43.77 


55.76 
67.50 
79.08 
90.53 
101.88 


0.025 


5.02 
7.38 
9.35 
11.14 
12.83 


14.45 
16.01 
17.53 
19.02 
20.48 


21.92 
23.34 
24.74 
26.12 
27.49 


28.85 
30.19 
31.53 
32.85 
34.17 


35.48 
36.78 
38.08 
39.36 
40.65 


41.92 
43.19 
44.46 
45.72 
46.98 


59.34 
71.42 
83.30 
95.02 
106.63 
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0.010 0.005 


6.63 
9.21 
11.34 
13.28 
15.09 


16.81 
18.48 
20.09 
21.67 
23.21 


24.72 
26.22 
27.69 
29.14 
30.58 


32.00 
33.41 
34.81 
36.19 
37.57 


38.93 
40.29 
41.64 
42.98 
44.31 


45.64 
46.96 
48.28 
49.59 
50.89 


63.69 
76.15 
88.38 
100.42 
112.33 


113.14 118.14 124.12 


124.34 


129.56 


135.81 


7.88 
10.60 
12.84 
14.86 
16.75 


18.55 
20.28 
21.96 
23.59 
25.19 


26.76 
28.30 
29.82 
31.32 
32.80 


34.27 
35.72 
37.16 
38.58 
40.00 


41.40 
42.80 
44.18 
45.56 
46.93 


48.29 
49.64 
50.99 
52.34 
53.67 


66.77 
79.49 
91.95 
104.22 
116.32 
128.30 
140.17 


From C. M. Thompson, “Table of Percentage Points of the Chi-Square Distribution.” 
Biometrika, 32, (1941), 188-189. Reprinted by permission. 
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Table V. Critical Values of the F Distribution 


This table gives the critical values of the F distribution for degrees of freedom 
v; = 1(1) 10, 12, 15, 20, 24, 30, 40, 60, 120, oo arranged across the top of the 
table and v2 = 1 (1) 30, 40, 60, 120, co arranged along the left margin of the 
table. The w-values are given for upper-tail tests of significance. All the critical 
values are provided to two decimal places. The lower-tailed critical values are 
not given, but can be obtained using the following relation: F'[vj, v2;1—a] = 
1/F[v2, v1; a]. 


l-a 


F[v, V2;1-a] 


Examples: (i) For vy; = 6, v2 = 30,a = 0.05, the desired critical value 
from the table is F[6, 30; 0.95] = 2.42. 
(ii) For vy = 10, v2 = 60, a = 0.10, the desired critical value 
from the table is F[10, 60;0.90] = 1.71. 
(iii) For v; = 8, vo = 24, a = 0.95, the desired critical value is 
obtained as F[8, 24, 0.05] = 1/F[24, 8, 0.95] = 1/3.12 = 
0.32. 
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Table VI. Power of the Student’s t Test 


This table gives the values of the noncentrality parameter 6 of the noncentral 
t distribution with degrees of freedom v = 1 (1) 30, 40, 60, 100, 00; one- 
tailed level of significance a = 0.05, 0.025, 0.01; and the power = 1 — B = 
0.10 (0.10) 0.90, 0.95, 0.99. Since the distribution of t 1s symmetrical about 
zero, the one-tailed levels of significance also represent two-tailed values of 
a = 0.10, 0.05, and 0.02. The table can be used to determine the power of a 
test of significance based on the Student’s ¢ distribution. For example, the power 
of the t test corresponding to v = 30, 6 = 3.0, and aw = 0.05 is approximately 
equal to 0.90. 


a = 0.05 
Power = 1— 
Vv 0.99 0.95 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 


] 16.47 12.53 10.51 8.19 6.63 5.38 4.31 3.35 2.46 1.60 .64 
2 6.88 5.52 4.81 3.98 3.40 2.92 2.49 2.07 1.63 1.15 50 
3 5.47 4.46 3.93 3.30 2.85 2.48 2.13 1.79 1.43 1.02 .46 
4 4.95 4.07 3.60 3.04 2.64 2.30 1.99 1.67 1.34 .96 43 
5 4.70 3.87 3.43 2.90 2.53 2.21 1.91 1.61 1.29 7 42 
6 
7 
8 
9 


4.55 3.75 3.33 2.82 2.46 2.15 1.86 1.57 1.26 .90 4) 


4.45 3.67 3.26 204 2.41 2.11 1.82 1.54 1.24 89 .40 

4.38 3.62 3.21 2.73 2.38 2.08 1.80 1.52 1.22 88 .40 

4.32 3.58 3.18 2.70 2.35 2.06 1.78 1.51 1.21 87 .39 
10 4.28 3.54 3.15 2.67 239 2.04 1.77 1.49 1.20 .86 .39 
1] 4.25 3.52 3.13 2.65 2.31 2.02 1.75 1.48 1.19 .86 39 
12 4.22 3.50 3.11 2.64 2.30 2.01 1.74 1.47 1.19 85 .38 
13 4.20 3.48 3.09 2.63 229 2.00 1.74 1.47 1.18 85 .38 
14 4.18 3.46 3.08 2.62 2.28 2.00 1.73 1.46 1.18 84 .38 
15 4.17 3.45 3.07 2.61 2.27 1.99 1.72 1.46 1.17 84 .38 
16 4.16 3.44 3.06 2.60 2.27 1.98 1.72 1.45 17 84 .38 
17 4.14 3.43 3.05 2.59 2.26 1.98 1.71 1.45 17 84 .38 
18 4.13 3.42 3.04 2.59 2.26 1.97 1.71 1.45 16 83 .38 


19 4.12 3.41 3.04 2.58 2.25 1.97 1.71 1.44 
20 4.12 3.41 3.03 2.58 229 1.97 1.70 1.44 


21 4.11 3.40 3.03 2.57 2.24 1.96 1.70 1.44 
22 4.10 3.40 3.02 2.57 2.24 1.96 1.70 1.44 


16 83 .38 
.16 .83 .38 


.16 83 .38 
16 83 37 


— — — — — 
. . 


ee ee ey 


23 4.10 3.39 3.02 2.56 2.24 1.96 1.69 1.43 15 .83 37 
24 4.09 3.39 3.01 2.56 225 1.95 1.69 1.43 15 83 37 
25 4.09 3.38 3.01 2.56 2.25 1.95 1.69 1.43 15 .83 37 


26 4.08 3.38 3.01 2.55 2.23 1.95 1.69 1.43 1.15 82 37 
27 4.08 3.38 3.00 2.55 2.23 1.95 1.69 1.43 1.15 82 37 
28 4.07 3.37 3.00 2.55 ae 1.95 1.69 1.43 1.15 82 37 
29 4.07 3.37 3.00 2.55 2,22 1.94 1.68 1.42 1.15 82 37 
30 4.07 3.37 3.00 2.54 2.22 1.94 1.68 1.42 1.15 82 37 


40 4.04 3.35 2.98 2.53 2.21 1.93 1.67 1.42 1.14 82 37 
60 4.02 3.33 2.96 Zio2 2.19 1.92 1.66 1.4] 1.13 81 37 
100 4.00 3.31 2.95 2.50 2.18 1.91 1.66 1.40 1.13 81 37 
oo 3.97 3.29 2.93 2.49 2.17 1.90 1.64 1.39 1.12 .80 36 
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TABLE VI (continued ) 


0.99 


32.83 
9.67 
6.88 
5.94 
5.49 


X22 
5.06 
4.94 
4.85 
4.78 


4.73 
4.69 
4.65 
4.62 
4.60 


4.58 
4.56 
4.54 
4.52 
4.51 


4.50 
4.49 
4.48 
4.47 
4.46 


4.46 
4.45 
4.44 
4.44 
4.43 


4.39 
4.36 
4.33 
4.29 


0.95 


24.98 
VAT 
5.65 
4.93 
4.57 


4.37 
4.23 
4.14 
4.07 
4.01 


3.97 
3.93 
3.9] 
3.88 
3.86 


3.84 
3.83 
3.82 
3.80 
3.79 


3.78 
3.77 
3.77 
3.76 
3.75 


3.75 
3.74 
3.73 
3.73 
3.73 


3.69 
3.66 
3.64 
3.60 


0.90 


20.96 
6.80 
5.01 
4.40 
4.09 


3.9] 
3.80 
3.7] 
3.65 
3.60 


3.57 
3.54 
3.51 
3.49 
3.47 


3.46 
3.44 
3.43 
3.42 
3.4] 


3.40 
3.39 
3.39 
3.38 
3.37 


3.37 
3.36 
3.36 
3.35 
3.35 


3.32 
3.29 
3.27 
3.24 


0.80 


16.33 
5.65 
4.26 
3.76 
3.51 


3.37 
3.27 
3.20 
3.15 
3.11 


3.08 
3.05 
3.03 
3.01 
3.00 


2.98 
297 
2.96 
2.95 
2.95 


2.93 
2.93 
2:93 
2.92 
2.92 


2.92 
2.9] 
2.90 
2.90 
2.90 


2.87 
2.85 
2.83 
2.80 


a = 0.025 

Power = 1 — 
0.70 0.60 
13.2] 10.73 
4.86 4.2] 
3.72 3.28 
3.3] 2.93 
3.10 2.75 
2.98 2.64 
2.89 2.57 
2.83 202 
2.79 2.48 
2.75 2.45 
2.73 2.43 
2.70 2.41 
2.69 2.39 
2.67 2.38 
2.66 2.37 
2.65 2.36 
2.64 2.35 
2.63 2.34 
2.61 2.33 
2.61 2.33 
2.60 232 
2.60 2,32 
2.59 2.31 
2.59 2.31 
2.58 2.30 
2.58 2.30 
2.58 2.30 
2.57 2.29 
2.57 2.29 
2.57 2.29 
2.55 2.21 
2.53 2.25 
2.51 2.23 
2.48 2.21 


0.50 


8.60 
3.63 
2.87 
2.58 
2.43 


2.34 
Zo) 
2.23 
2.20 
2.17 


2.15 
2.13 
2.12 
2.11 
2.09 


2.09 
2.08 
2.07 
2.06 
2.06 


2.05 
2.05 
2.05 
2.04 
2.04 


2.04 
2.03 
2.03 
2.03 
2.02 


2.01 
1.99 
1.98 
1.96 


0.40 


6.68 
3.07 
2.47 
2.23 
2.11 


2.03 
1.98 
1.94 
1.91 
1.89 


1.87 
1.85 
1.84 
1.83 
1.82 


1.81 
1.81 
1.80 
1.80 
1.79 


1.79 
1.78 
1.78 
1.78 
1.77 


1.77 
1.77 
1.77 
1.77 
1.76 


1.75 
1.73 
1.73 
1.7] 


0.30 


4.9] 
2.50 
2.05 
1.86 
1.76 


1.70 
1.66 
1.63 
1.60 
1.59 


1.57 
1.56 
1.55 
1.54 
1.53 


1.53 
1.52 
1.52 
1.5] 
1.51 


1.50 
1.50 
1.50 
1.50 
1.49 


1.49 
1.49 
1.49 
1.48 
1.48 


1.47 
1.46 
1.45 
1.44 


iy iy —y — —" 
. . . . . 
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TABLE VI (continued ) 


The Analysis of Variance 


a= 0.01 
Power = 1 —G 

v 0.99 0.95 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 
] 82.00 62.40 52.37 40.80 33.00 26.79 21.47 16.69 12.27 8.07 4.00 
2 15.22. 12.26 10.74 8.96 7.73 6.73 5.83 4.98 4.12 3.20 2.08 
3 9.34 7.71 6.86 5.87 5.17 4.59 4.07 3.56 3.03 2.44 1.66 
4 Ta2 6.28 5.64 4.88 4.34 3.88 3.47 3.06 2.63 2.14 1.48 
5 6.68 5.62 5.07 4.40 3.93 3.54 3.17 2.81 2.42 1.98 1.38 
6 6.21 5.25 4.74 4.13 3.70 3.33 2.99 2.66 2.30 1.88 1.32 
7 5.91 5.01 4.53 3.96 3.55 3.20 2.88 2.56 2.22 1.82 1.27 
8 5.71 4.85 4.39 3.84 3.44 3.11 2.80 2.49 2.16 1.77 1.24 
9 5.56 4.72 4.28 3.75 3.37 3.04 2.74 2.43 2.11 1.74 1.22 
10 5.45 4.63 4.20 3.68 3.31 2.99 2.69 2.39 2.08 1.71 1.20 
1] 5.36 4.56 4.14 3.63 3.26 2.94 2.65 2.36 2.05 169 1.18 
12 5.29 4.50 4.09 3.58 3.22 2.91 2.62 2.33 2.03 1.67 1.17 
13 5.23 4.46 4.04 3.55 3.19 2.88 2.60 2.31 2.01 1.65 1.16 
14 5.18 4.42 4.01 3.51 3.16 2.86 2.57 2.29 1.99 164 1.15 
15 5.14 4.38 3.98 3.49 3.14 2.84 2.56 2.28 1.98 1.63 1.14 
16 5.11 4.35 3.95 3.47 3.12 2.82 2.54 2.26 1.97 162 1.14 
17 5.08 4.33 3.93 3.45 3.10 2.80 2.53 225 1.96 1.61 1.13 
18 5.05 4.31 3.91 3.43 3.09 2.79 2.52 2.24 1.95 160 1.13 
19 5.03 4,29 3.89 3.42 3.07 2.78 2.50 2.23 1.94 160 1.12 
20 5.01 4.27 3.88 3.40 3.06 2.77 2.50 2.22 1.93 159 1.12 
21 4.99 4.25 3.86 3.39 3.05 2.76 2.49 2.22 1.92 159 1.11 
22 4.97 4.24 3.85 3.38 3.04 2.75 2.48 2.21 1.92 1.58 1.11 
23 4.96 4.23 3.84 3.37 3.03 2.74 2.47 2.20 1.91 1.58 1.11 
24 4.94 4.22 3.83 3.36 3.02 213 2.47 2.20 1.91 1.57 1.11 
25 4.93 4.20 3.82 3.35 3.02 2.73 2.46 2.19 1.90 1.57 1.10 
26 4.92 4.19 3.81 3.34 3.01 272 2.45 2.19 1.90 1.57 1.10 
27 4.91 4.19 3.80 3.34 3.00 2.72 2.45 2.18 1.90 156 1.10 
28 4.90 4.18 3.79 3.33 3.00 2.71 2.44 2.18 1.89 1.56 1.10 
29 4.89 4.17 3.79 3.32 2.99 2.71 2.44 2.17 1.89 1.56 1.10 
30 4.88 4.16 3.78 3.32 2.99 2.70 2.44 2.17 1.89 1.55 1.09 
40 4.82 4.11 3.74 3.28 2.95 2.67 2.41 2.15 1.86 1.54 1.08 
60 4.76 4.06 3.69 3.24 2.92 2.64 2.38 242 1.84 1.52 1.07 
100 4.72 4.03 3.66 3.21 2.89 2.62 2.36 2.10 1.83 1.51 1.06 
fore) 4.65 3.97 3.61 3.17 2.85 2.58 2.33 2.07 1.80 1.48 1.04 


From D. B. Owen, “The Power of Student’s t Test,’ Journal of the American 
Statistical Association, 60 (1965), 320-333. Reprinted by permission. 
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Table VII. Power of the Analysis of Variance F Test 


This table gives the values of type II error (8) of a test of significance based on 
the F distribution corresponding to the numerator degrees of freedom v; = 1 
(1) 10 (2) 12; denominator degrees of freedom v2 = 2 (2) 30, 40, 60, 120, 00; 
standardized noncentrality parameter @ = 0.5 (0.5) 1.0 (0.2) (2.2) (0.4) 3.0; 
and the level of significance a = 0.01, 0.05, 0.1. For example, the power of 
the F test corresponding to vy = 3, vz = 30, @ = 1.4, and a = 0.05 is equal to 
1 — 0.4182 = 0.5918. To obtain power for odd values of v2 a linear interpolation 
in the reciprocal of v2 may be used, which generally gives three-decimal-place 
accuracy. To obtain power for values of ¢, not given in the table (0.5 < @ < 3.0), 
a three-point Lagrangian interpolation may be used, which generally gives an 
accuracy of at least two decimal places. For ¢ > 3, the values of power are 
mostly close to one. 


a =0.01 
V2 Oz 5 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.6 3.0 
yyA= 
2 9851 .9705 .9620 9521 .9408 9282 .9143 .8991 .8654 8277 
4 .9809 .9492 .9280 9012 8682 .8292 .7843 7341 .6216 5014 
6 .9782 .9340 .9030 .8629 8131 7541 .6870 .6136 4589 3125 
8 .9764 9236 .8859 .8367 717159 .7043 .6242 5387 3678 2211 
10 9752 9163 .8738 .8184 .7501 .6704 5824 4904 3136 .1725 
12 9743 9109 .8650 .8050 7314 .6462 .5532 4574 .2787 .1437 
14 .9736 .9068 .8582 .7949 7174 6283 5318 4336 2547 .1250 
16 .9730 .9036 .8529 .7870 .7066 .6145 5156 4158 .2374 1121 
18 9726 .9010 .8487 .7807 .6979 .6036 5028 .4020 .2243 .1027 
20 .9723 8989 8452 .7755 .6908 5947 4925 3910 2141 .0957 
22 .9720 8971 .8423 7712 .6850 5874 .4841 3820 .2060 .0902 
24 9717 8956 .8398 .7675 .6801 5813 4771 .3746 .1994 0858 
26 9715 8943 .8377 .7644 .6758 .5760 4712 3683 .1938 .0822 
28 9713 8931 .8359 7617 .6722 5716 .4661 .3630 .1892 .0792 
30 9711 8922 8343 .7593 .6690 5677 .4617 3584 .1852 .0767 
40 9705 .8886 .8285 .7509 .6578 5539 .4462 .3424 .1718 .0683 
60 .9699 8850 .8226 .7423 .6463 5401 4308 3267 .1590 .0608 
120 .9693 8812 .8165 .7335 .6347 5261 .4155 3113 .1468 .0539 
ere) .9687 8773 .8102 .7244 .6229 5120 .4003 .2962 .1354 .0478 
a 2 
2 .9863 .9753 .9688 .9613 9527 .9430 .9323 .9207 8945 .8650 
4 9828 9567 .9386 9153 8862 8511 .8100 7635 6571 5401 
6 .9803 .9409 .9118 .8730 .8237 -7640 .6951 6191 .4576 3052 
8 .9784 .9288 .8910 .8401 .7754 .6982 .6110 .5182 3358 .1869 
10 .9770 .9196 8751 .8150 .7393 .6500 5515 .4498 .2626 .1268 
12 .9760 .9124 .8627 .7957 7118 .6142 5085 .4022 .2163 .0934 
14 .9752 .9067 .8529 .7806 .6905 5869 .4765 3678 .1854 .0733 
16 9745 9021 8450 .7684 .6736 5655 4519 .3420 -1636 .0603 
18 .9740 8983 8386 .7585 .6600 5485 4326 3221 .1476 0513 
20 9735 8951 8331 .7502 .6486 5345 .4170 3063 .1354 .0449 
22 9731 8924 .8285 .7433 .6392 5229 4042 .2936 .1260 .0401 
24 .9728 8901 .8246 .7373 6312 5132 3936 .2830 .1184 .0364 
26 9725 8881 8212 .7322 .6243 5048 .3845 .2742 .1122 .0335 
28 9723 8863 8182 7277 .6182 4976 .3768 .2667 .1070 .0312 
30 9721 8848 8156 .7238 .6130 4914 .3701 .2603 .1027 .0293 
40 9713 8791 .8060 -7096 5943 .4693 .3468 .2382 0885 .0233 
60 .9704 8731 .7960 .6948 5749 .4469 3237 .2170 .0757 .0183 
120 .9695 8668 .7854 .6794 5551 4244 3011 .1968 .0643 .0143 
oe) .9686 .8600 .7743 .6634 5349 4019 .2789 .1776 .0543 .0111 
4 = 3 
2 .9867 .9769 9711 .9644 .9567 9481 .9385 .9280 .9045 .8779 
4 9835 9592 9421 .9199 8919 8580 8181 .7726 .6678 5517 
6 .9809 9427 .9136 .8742 8237 -7620 -6906 .6117 4448 .2899 
8 .9790 9291 .8896 .8357 1665 6835 .5902 4917 3032 .1576 
10 9775 9181 .8703 .8047 7214 5234 5166 -4085 2191 .0941 
12 .9763 .9093 .8547 -7800 .6861 5776 .4625 3504 .1675 .0615 
14 9753 9021 .8419 -7600 .6580 542] .4220 3086 .1343 0434 
16 .9746 8961 8314 .7437 .6354 5142 3910 .2776 1118 .0325 
18 .9739 8910 .8227 -7302 .6169 4917 .3666 .2540 .0958 .0256 
20 9734 8868 8152 7188 .6016 4733 347) 2355 0841 0209 
22 .9729 8831 .8089 .7092 5886 4580 3311 .2207 0752 0175 
24 9725 .8799 .8034 .7008 5776 4451 3178 .2087 .0683 0151 
26 9721 8772 .7986 .6936 5681 .4341 .3066 .1987 .0628 .0132 
28 9718 .8747 .7944 .6873 5599 4247 .2971 .1903 .0584 0118 
30 9716 8725 -7906 .6817 5526 .4164 .2889 .1831 .0547 .0107 
40 .9705 8645 .7769 .6614 5266 3873 .2606 .1590 .0430 .0074 
60 .9694 8558 .7622 .6400 .4997 3582 .2332 .1367 0334 .0050 
120 .9682 8464 .71464 .6175 4721 3292 .2070 .1163 0255 .0033 


oe) .9669 8361 .7295 5938 4439 3005 1821 0978 .0192 .0022 


622 


TABLE VII (continued) 


The Analysis of Variance 


a= 0.01 
V2 @=.5 1.0 1.2 1.4 1.6 1.8 2.0 22 2.6 3.0 
yy, =4 
Z .9869 9777 9723 .9660 9587 .9506 .9416 9317 .9096 8844 
4 9838 .9604 9438 9221 8946 8612 8217 7767 6725 5566 
6 9812 9433 .9139 8738 8219 7585 .6848 .6036 4330 .2767 
8 9792 9284 8873 .8306 £7575 .6697 5716 4691 2776 1363 
10 9776 .9160 8650 .7944 .1047 5996 4885 3745 1867 .0726 
12 9763 .9056 8464 .7647 .6622 5452 4236 3087 .1330 0424 
14 9752 8969 .8309 .7403 6281 5027 3765 .2620 .0998 .0270 
16 9743 8896 8178 .7200 .6003 4691 3405 .2279 .0783 0184 
18 9736 8834 .8068 .7030 5774 4420 3124 .2023 .0636 0133 
20 .9730 .8780 .1974 .6886 5583 4199 .2901 1825 .0533 .0101 
22 9724 8734 7892 .6763 5421 4015 .2719 1670 0457 .0079 
24 9719 8693 .7821 .6656 5283 .3861 .2570 1545 .0400 .0064 
26 9715 8657 1759 .6563 5164 3730 .2445 1442 .0355 0054 
28 9711 8625 .7704 6482 .5060 3617 .2340 1357 .0320 .0046 
30 .9708 8597 7655 .6409 .4969 3519 .2249 1286 0292 .0040 
40 .9695 849] 7473 6145 4643 3176 1942 1051 .0207 .0023 
60 .9682 8374 7275 5864 4306 .2836 1653 0844 0143 0013 
120 .9666 8245 .7060 5566 .3962 .2504 .1386 .0665 .0096 .0007 
oe) .9649 8103 .6828 5253 3614 2185 1144 0513 .0063 .0004 
yy= 5 
2 .9870 9782 .9730 .9669 .9600 9521 9435 9340 9126 8884 
4 .9840 .9611 9448 9233 8961 8629 8237 .7789 .6749 5591 
6 9814 9435 9138 8729 8199 .7550 6795 5965 4231 .2663 
8 .9793 .9276 8850 8258 £7494 6578 5559 4504 2575 1207 
10 .9776 9138 .8600 .7852 .6899 5792 4615 3471 1625 058 | 
12 .9762 9021 8387 7510 .6413 5174 3914 2757 1084 .0306 
14 .9750 8920 8206 .7224 .6017 .4690 339] 2257 .0763 0176 
16 .9740 8835 8052 .6984 5692 4306 .2994 1898 0564 0109 
18 9732 8761 .7920 .6782 5423 3998 .2688 1633 0435 0073 
20 9725 8696 .7806 .6609 5198 3746 .2446 1433 0347 .005 1 
22 9718 .8640 .7706 .6460 5007 3538 2252 1278 0284 .0037 
24 9713 8590 7619 .6330 4844 .3363 .2093 1156 0239 0029 
26 .9708 8546 £71543 6217 .4704 3216 1962 .1057 0205 .0023 
28 .9704 8507 .7474 6118 4581 .3089 1851 .0976 0179 0018 
30 9700 8472 7413 6029 4474 .2980 1758 .0909 0158 0015 
40 9685 8339 .7186 5705 .4090 .2601 .1446 0695 .0100 .0007 
60 .9668 8189 .6935 53359 .3696 2232 1163 0516 .0061 .0003 
120 .9650 8023 .6660 4992 3298 1882 0913 0372 .0035 0002 
oe) .9628 7835 .6360 .4606 .2901 1556 0699 0259 0020 .0000 
y= 
ps .9871 9785 9735 .9675 .9608 9532 9447 .9355 9147 8910 
4 .9841 .9616 9454 9241 .8971 .8640 8250 7802 6764 5605 
6 9815 9435 9135 8720 8180 7518 .6749 5905 4149 .2578 
8 .9794 9267 8826 8216 .7424 6477 5428 4351 .2417 .1090 
10 .9776 9118 8556 .1770 .6772 618 4406 3248 1442 .0480 
12 .9761 8988 8318 .7388 .6230 4938 3647 .2492 .0905 .0230 
14 .9748 8876 8113 .1064 5785 4402 3083 1971 .0600 0120 
16 9737 8778 .7936 .6790 5418 3978 .2659 .1604 0419 .0068 
18 .9728 8692 7783 .6556 5113 3639 .2335 1339 .0306 0042 
20 .9720 8617 .7650 .6356 4857 .3363 .2082 1142 0232 .0027 
22 9713 8551 7533 .6183 4641 .3136 1881 0992 0182 0019 
24 .9707 8493 .7430 .6032 .4456 .2946 .1719 .0876 0147 0013 
26 9701 844] .7339 5900 4297 .2787 1586 .0784 0121 .0010 
28 .9696 .8394 .7258 5783 4158 2651 .1476 .0710 0102 .0008 
30 .9692 8351 7185 5679 4037 .2533 .1383 .0649 0088 .0006 
40 .9675 8191 6911 5299 3605 .2133 .1080 .0462 0049 .0002 
60 9655 .8008 .6607 4891 .3166 1753 .08 16 0315 .0026 .0001 
120 .9633 .7800 .6271 4459 .2728 .1402 .0595 .0205 .0013 .0000 
0° .9607 .7563 5901 .4009 .2301 1089 0417 0128 .0006 .0000 
4, = 7 
2 .9872 9787 .9738 .9680 .9614 .9539 .9456 .9365 9161 8929 
4 9842 .9619 9459 9247 8977 8648 8258 7811 .6773 5613 
6 .9816 .9435 9132 8711 8163 .7490 .6710 5854 4082 .2510 
8 9794 .9260 8810 8179 .7363 .6390 5317 4224 .2290 .1000 
10 9775 .9100 8516 1699 .6661 5469 4231 .3065 1299 .0407 
12 .9760 8959 8256 .7280 .6070 4735 3423 .2278 .0770 0178 
14 .9746 8835 8029 .6922 5581 4156 .2828 .1744 0482 .0085 
16 9735 8726 7831 6615 5176 3698 2385 1375 0319 0044 
18 .9724 8629 7658 .6353 4840 .3333 .2049 1113 0221 0025 
20 .9716 8544 .7506 6127 4558 .3038 1791 0923 .0160 0015 
22 .9708 8469 1373 5931 4319 .2796 1587 0781 0120 .0010 
24 .9701 8401 7254 5760 4115 .2596 1425 .0673 .0093 .0006 
26 9695 834] 7149 5610 3940 2429 1294 .0589 .0074 .0005 
28 .9689 8287 £7055 5477 .3787 .2286 .1186 0522 .0060 .0003 
30 .9684 8237 .6971 5359 3654 .2165 .1096 .0469 0050 .0002 
40 .9665 8048 .665 1 4926 3182 1754 0810 .0309 0025 .0001 
60 .9642 .7830 6294 4462 .2709 1375 0572 .0193 0011 .0000 
120 .9616 .7580 5896 3973 .2246 .1038 0384 0112 .0005 .0000 
ove) 9585 .7290 5456 3467 1807 .0751 0244 .0062 0002 .0000 
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TABLE VII (continued ) 


a = 0.01 
V2 o=. 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.6 3.0 

i 8 
2 .9872 .9789 .9740 .9683 .9618 9544 .9463 .9373 9172 8943 
4 .9843 9621 .9462 9251 8982 8653 .8264 7817 .6779 5619 
6 9817 9435 9128 .8703 8148 -7467 .6676 811 4026 2454 
8 .9794 9252 .8793 8147 7311 6315 5223 AIT .2186 .0928 
10 9775 .9084 8481 .7635 .6564 5341 4082 2912 .1185 0352 
12 9758 8933 8201 -7184 5930 4559 3234 2101 .0668 0142 
14 .9744 8798 .7953 .6794 5401 3943 .2614 .1560 .0396 .0063 
16 .9732 8678 7735 .6458 4963 3458 .2157 1193 0248 .0030 
18 9721 8571 .7543 .6169 4598 3072 1815 .0938 0163 .0016 
20 9711 8477 .7374 5919 4292 .2762 .1554 .0756 0113 .0009 
22 .9703 8392 1224 5702 4034 .2510 .1352 .0624 .008 1 0005 
24 .9695 8316 7091 5512 3814 .2302 .1193 0524 .0060 .0003 
26 .9689 8247 .6972 5345 3625 .2129 .1065 0449 .0046 .0002 
28 9682 8185 .6866 5198 3461 .1983 .0962 0389 .0036 .0001 
30 .9677 8129 .6770 5067 3318 1859 .0876 0343 .0029 .0001 
40 9655 7911 .6406 4584 2815 .1447 0611 .0209 0012 .0000 
60 .9630 7658 5995 .4070 .2318 .1079 .0402 0118 .0005 .0000 
120 .9600 .7362 .5536 3531 1843 .0765 0247 .006 1 .0002 .0000 
ore) 9563 .7016 5028 2981 .1406 0512 0141 .0029 .0000 .0000 

Yyy= 9 
2 .9872 9790 .9742 .9686 .9621 9549 .9468 .9380 9181 8954 
4 .9843 9623 .9464 9254 8986 8657 .8268 .7821 .6783 5623 
6 9817 9434 9125 8696 8135 .7446 .6647 5774 3978 .2407 
8 .9794 .9246 .8778 8119 7265 6251 5142 -4025 .2099 .0871 
10 .9774 .9070 8450 .7579 .6479 5229 3953 .2782 .1093 .0310 
12 9757 8909 8151 -7098 5806 4407 3073 1955 0587 .0116 
14 .9742 8764 .7884 .6679 5242 3759 .2433 .1410 0331 .0047 
16 9729 8634 .7647 .6316 A774 3249 1966 .1047 .0197 .0021 
18 9718 8518 .7438 -6002 4384 .2847 1621 -0800 0124 .0010 
20 .9708 8414 £7252 .5730 4057 2525 1361 .0628 .008 1 .0005 
22 .9698 .8320 .7086 .5493 .3782 .2265 .1162 .0504 .0056 .0003 
24 .9690 8235 .6939 5286 3548 2053 .1007 0414 .0040 .0002 
26 .9683 8159 .6807 5104 .3347 .1877 0885 .0346 .0029 .0001 
28 .9676 .8090 .6689 4943 3174 .1730 .0786 0294 0022 .0001 
30 .9670 .8026 -6581 .4799 3023 .1606 .0706 0254 .0017 .0000 
40 .9646 .7780 .6174 4273 .2497 1199 .0464 0143 .0006 .0000 
60 .9618 .7490 5712 3714 .1986 .0848 0283 .0073 .0002 .0000 
120 9583 .7148 5194 3133 -1508 0562 0158 .0033 .0001 .0000 
oe) 9542 -6745 .4620 .2549 1085 .0345 .0080 .0014 .0000 .0000 

y= 10 
2 .9873 9791 9744 .9688 .9624 9552 .9472 9385 9188 8963 
4 9844 .9625 .9466 .9256 .8989 .8660 8271 £7825 .6786 5625 
6 9817 9433 9123 .8690 8124 .7428 .6622 5742 3938 .2367 
8 .9794 .9240 .8765 .8094 7225 .6195 5072 3947 .2026 0823 
10 .9774 9057 8422 .7529 .6404 5131 3842 .2672 1017 0277 
12 .9756 8888 .8106 .7022 5696 4273 .2935 1831 0523 .0097 
14 9741 8734 -7822 .6575 5101 3597 .2279 1285 0282 .0037 
16 9727 8594 7567 .6187 4605 .3068 -1805 .0928 .0160 .0015 
18 9715 8469 .7341 5850 4193 .2652 .1459 .0690 .0096 .0007 
20 .9704 8355 .7139 5558 3848 2322 .1201 .0527 .0060 .0003 
22 .9694 8253 .6959 .5303 3558 .2057 .1007 0413 .0039 .0002 
24 .9685 8160 .6798 5080 3312 1841 0858 0331 .0027 .0001 
26 .9677 .8076 .6653 4883 3102 .1665 .0741 .0270 .0019 .0001 
28 .9670 .7999 .6523 .4709 2921 1518 .0649 .0225 .0014 .0000 
30 .9664 .7929 .6405 4555 .2764 1395 .0574 .0190 .0010 .0000 
40 .9638 7655 5955 .3989 .2220 0998 .0355 .0098 .0003 .0000 
60 .9606 1328 5443 3390 1702 .0668 .0200 0045 .0001 .0000 
120 .9568 .6939 4869 .2776 A232 0411 .0101 0018 .0000 .0000 
ore) 9520 .6475 4234 .2170 0832 .0230 0045 .0007 .0000 .0000 

yy = 12 
2 .9873 .9793 .9746 .9691 .9628 9557 .9478 .9392 .9198 8977 
4 .9844 .9627 .9469 .9260 8993 8665 .8276 7829 .6790 5628 
6 .9818 .9432 9118 .8679 8104 7398 .6581 .5689 3872 .2304 
8 9794 9231 8743 8052 7158 6101 4956 3819 .1910 .0750 
10 9773 .9035 .8375 .7445 6277 4968 3661 .2494 .0900 0228 
12 9754 8850 .8029 .6890 5509 4050 .2708 .1635 0429 .007 1 
14 .9738 8680 7713 .6396 4860 3329 -2030 .1092 0212 .0024 
16 9723 8523 .7426 5963 4318 .2768 .1549 .0749 0110 .0009 
18 .9710 .8380 .7169 5585 3868 2333 .1206 .0529 .0060 .0003 
20 .9698 8250 .6938 5256 3493 1991 .0958 .0384 .0035 .0001 
22 .9687 8131 .6730 .4969 3179 1721 .0775 .0286 .0021 .0001 
24 .9677 8023 6543 4717 2914 1505 .0638 .0219 .0013 .0000 
26 .9668 .71924 .6376 .4496 .2690 .1330 0533 .O171 0009 .0000 
28 9659 .7833 .6223 4300 .2498 1187 .0452 .0136 .0006 .0000 
30 9652 .7750 .6086 4127 2333 -1069 .0389 0110 .0004 .0000 
40 .9622 7419 9556 .3492 .1770 .0701 0212 .0048 .0001 .0000 
60 9584 1019 4951 2833 1256 0417 0101 0018 0000 .0000 
120 9537 6534 4271 .2173 .0819 .0220 .004 1 .0005 .0000 .0000 


oe) .9476 5948 3530 1552 0479 .0100 0015 0001 .0000 .0000 
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The Analysis of Variance 


a =0.05 
V2 g=.5 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.6 3.0 
V4 = 
2 9271 8617 8256 7847 7402 .6927 .6432 5926 AQIS .3950 
4 9141 8048 -TAIS 6694 5910 5095 4284 3509 .2169 1198 
6 .9077 .7768 .7010 6153 5238 4315 3431 .2629 1374 0611 
8 .9040 7610 6784 5858 .4883 3916 3015 2223 1054 0413 
10 .9017 7510 .6642 5675 .4666 .3680 .2775 1997 .0890 0322 
12 .9000 7440 .6544 5551 4521 3524 .2620 1854 0793 0272 
14 8988 71390 .6474 5462 4418 3414 2513 .1756 .0728 .0240 
16 8979 7351 .6420 5394 4341 .3333 2433 1685 .0683 0219 
18 8972 .7321 .6379 5342 4281 3270 .2373 1631 .0649 .0203 
20 8966 7297 6345 5300 4233 3220 2325 .1589 .0623 0192 
22 8961 7277 .6317 5265 4194 3180 2287 1555 .0603 0183 
24 8957 .7260 .6294 5236 4161 3146 2255 1527 0586 0175 
26 8954 7246 6274 212 4134 3118 .2228 1504 .0573 0169 
28 8951 71233 6258 5192 Alll .3094 .2206 1485 .0561 0165 
30 8948 7223 .6243 5173 .4090 3073 .2186 .1468 0551 0160 
40 8939 7185 6192 5110 .4020 3001 2119 1410 0518 0147 
60 8930 7147 .6140 5047 .3949 .2930 2053 1354 0487 0134 
120 8920 .7108 .6087 4983 .3879 .2859 .1988 1300 0457 0123 
oe) 8910 .7070 .6036 4920 3810 .2791 .1926 1248 .0430 0112 
VY =2 
2 .9324 8814 8527 8201 .7840 7451 .7038 .6608 5722 4837 
4 9201 8239 1657 .6976 6219 5414 4598 3804 .2400 1353 
6 9129 7891 7135 .6257 5303 4330 3396 .2554 1264 0520 
8 .9083 1672 .6810 5821 4769 3729 .2773 1955 .0821 .0273 
10 9052 .7523 6592 5536 .4430 3361 .2408 -1624 .0609 0175 
12 .9030 TAIT .6438 5336 4197 3115 .2173 1419 0490 0126 
14 9013 71337 6323 5189 4028 2941 .2010 1281 .0416 .0099 
16 -.9000 7274 6234 5077 3901 2812 1892 1183 .0367 0082 
18 8989 7225 .6164 4988 3802 .2713 1802 1110 .0331 .007 1 
20 8980 .7184 .6107 4917 3723 .2634 1732 1054 0305 .0063 
22 8973 .7150 .6059 4858 3658 .2570 .1675 . 1009 0285 .0057 
24 8967 7122 .6019 4808 3603 2517 .1629 .0973 .0269 .0052 
26 8961 £7097 5985 4767 3558 2472 .1590 .0943 .0256 .0048 
28 8957 .7076 5956 4730 3518 .2434 1558 .0918 0245 0045 
30 8953 7058 5930 .4699 3484 2401 .1530 .0896 .0236 .0043 
40 8938 6992 5839 4588 3365 .2288 1434 0824 .0207 .0035 
60 8923 6924 5746 .4476 3247 2177 1341 .0756 0181 .0029 
120 8908 .6855 5651 .4364 3129 .2069 1253 .0692 0157 0024 
oe) 8892 .6785 5556 4251 3013 .1963 1168 .0632 0137 0019 
yy= 3 
2 9342 8882 8623 8327 .7998 -7640 .7260 .6861 .6030 5187 
4 9221 8302 .7735 .7064 6311 5505 .4683 3880 .2453 1384 
6 9144 .7909 .7134 .6226 5235 4225 3264 .2407 1132 0435 
8 9092 71643 .6733 5683 -4570 3482 .2504 1694 .0639 .0184 
10 9056 7454 .6453 5314 4134 3019 .2059 1307 .0419 .0098 
12 9028 .7314 .6249 .5050 3831 .2709 .1776 .1074 .0305 .0061 
14 .9007 71207 .6093 4853 3611 .2490 1583 0922 0238 0042 
16 8990 7122 5972 4701 3443 .2328 1444 .0817 .0196 .0031 
18 8976 .7054 5874 4581 3313 .2204 1340 .0740 0167 0025 
20 8965 .6997 5794 4483 3208 .2106 1259 .0682 .0146 .0020 
22 8955 .6950 5728 4402 3122 .2026 1195 .0637 .0131 0017 
24 8947 .6909 5671 .4333 3051 1961 1143 .0601 O119 0015 
26 8940 .6875 5623 4275 .2990 1907 .1100 0571 0110 0013 
28 8934 .6845 5581 4225 .2938 1860 .1064 0547 .0103 .0012 
30 8928 6818 5544 4182 .2894 1820 1033 0526 .0097 0011 
40 8909 .6723 5414 4028 .2738 1684 .0930 .0458 .0078 .0008 
60 8888 .6624 5279 3872 2583 1552 .0833 .0397 0062 .0006 
120 8866 6522 5142 3716 2431 1425 .0743 .0342 0049 .0004 
oe) 8843 6415 5000 3557 2280 1304 0659 .0293 .0038 .0003 
74 =4 
2 9351 8917 8672 8391 .8079 .7738 .7375 .6993 6193 5374 
4 9232 8332 7771 .7103 .6350 5542 4714 3905 2466 1389 
6 9151 .7906 7112 6178 5158 4122 3143 2282 .1030 0375 
8 9094 .7602 .6649 5549 4389 3271 .2286 .1493 0515 0132 
10 .9052 .7378 6315 5110 3876 .2736 1788 .1076 .0301 0059 
12 .9020 7208 .6066 4791 3516 2380 1475 .0833 0199 0032 
14 8995 .7076 5875 4550 3253 2129 1266 .0680 0143 0019 
16 8975 .6970 5723 .4363 .3054 1945 1118 0577 0109 .0013 
18 8958 .6883 5600 4214 2898 1804 .1009 .0503 0087 .0009 
20 8945 6811 5498 .4092 .2774 1695 .0926 .0449 .0073 .0007 
22 8933 .6750 5413 3991 .2672 1607 .0861 .0408 .0062 .0006 
24 8923 .6698 5341 3907 2587 1535 .0808 .0376 0054 .0005 
26 8914 .6653 5279 3834 .2516 1475 .0765 .0349 .0049 0004 
28 8906 .6614 5225 3772 2455 1424 .0730 .0328 0044 .0003 
30 8899 .6579 5178 3718 .2402 1381 .0700 0311 .0040 0003 
40 8874 6454 5009 3526 2219 1234 .0601 .0254 0029 .0002 
60 8848 .6322 4833 3332 .2040 1095 0511 .0206 .002 1 .0001 
120 8819 6183 4652 3136 1865 .0965 0431 .0164 0015 .0001 
ove) 8789 .6038 4466 2940 1695 0844 .0360 .0130 0011 .0000 
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TABLE VII (continued ) 


a =0.05 
V2 o=.5 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.6 3.0 
y= 5 
2 .9356 8939 .8702 8431 8128 .7798 7445 -7074 .6293 5490 
4 9238 8349 7791 .7124 .6369 5558 4727 3914 .2467 1386 
6 9154 -7897 .7087 .6131 5088 4033 3044 2181 .0952 .0333 
8 .9093 7561 .6573 5432 4237 3099 2115 1342 .0430 0100 
10 .9048 .7308 .6193 4933 3660 .2509 .1579 .0909 0227 .0038 
12 9012 7111 .5904 .4566 3254 2118 1249 .0665 .0136 .0018 
14 8983 .6956 5679 4287 .2957 1845 .1033 .0516 .0090 .0010 
16 8960 .6829 5499 .4069 .2732 1646 .0883 0418 .0064 .0006 
18 8941 6725 5352 3895 2557 .1496 .0774 0351 .0048 .0004 
20 8924 .6638 5231 3753 2417 .1380 .0693 .0303 .0038 .0003 
22 8910 .6564 5129 3635 .2303 1288 .0630 .0267 .0031 .0002 
24 8898 .6501 5042 3536 .2209 1213 .0580 .0240 .0026 .0001 
26 8888 6445 .4967 3451 .2129 151 .0540 .0218 .0022 .0001 
28 .8878 .6397 .4901 .3379 .2062 .1099 .0507 .0201 .0019 .0001 
30 8870 .6354 4844 3315 .2003 1055 .0479 .0186 0017 .0001 
40 8840 .6198 .4638 3091 .1803 .0908 .0390 0142 0011 .0000 
60 8807 .6033 4423 .2864 .1609 .0772 .0313 .0106 .0007 .0000 
120 8771 5857 4201 .2638 1422 .0649 0247 .0078 0004 .0000 
oe) 8733 5671 3971 .2412 1245 0538 .0192 .0056 .0003 .0000 
Y= 6 
2 .9360 8953 .8722 .8457 8161 .7839 .7493 .7129 .6361 5569 
4 9242 8361 .7803 .7136 .6380 5567 4733 3916 .2464 1381 
6 9156 7887 .7063 .6090 5028 3959 .2962 .2100 0893 .0301 
8 .9092 £7525 .6506 5332 4109 .2958 .1978 1225 .0369 .0080 
10 9042 7245 -.6086 4782 3480 .2325 1417 .0784 .0177 .0026 
12 .9003 71024 .5761 4373 3036 .1908 .1077 0544 .0097 .0011 
14 .8972 .6847 .5506 4061 2711 .1619 0859 .0401 .0059 .0005 
16 8946 .6702 5301 3816 2465 1412 .0710 .0312 .0039 .0003 
18 8924 6582 5132 3621 2275 1257 .0605 0252 .0028 .0002 
20 8905 .6480 .4992 3461 .2124 1139 0528 .0210 .0021 .0001 
22 8889 .6394 4874 .3328 .2001 1045 .0469 .0179 .0016 .0001 
24 8875 .6319 4773 3216 .1900 .0970 .0423 0157 .0013 .0001 
26 8863 6253 .4686 3121 1815 .0908 .0387 .0139 0011 .0000 
28 8852 .6196 .4610 3039 1744 .0857 .0357 0125 .0009 .0000 
30 8843 6145 4543 .2968 -1682 .0814 .0333 0114 .0008 .0000 
40 .8807 5960 .4302 .2717 1471 .0672 .0256 008 1 .0004 .0000 
60 .8768 .5760 .4050 .2464 .1270 0545 .0193 .0055 .0002 .0000 
120 8724 5547 .3789 .2214 .1082 0434 .0141 .0037 .0001 .0000 
oe) .8677 5319 3520 .1967 .0907 .0339 .0101 .0024 .0000 .0000 
y= 
2 .9363 8963 .8736 .8476 8185 .7868 .7527 .7168 .6410 5627 
4 9245 8368 7811 .7144 .6387 5571 4735 3916 .2460 .1376 
6 9157 1878 .7042 .6054 4978 3897 .2895 .2035 .0846 .0278 
8 .9090 .7492 .6449 5247 4002 .2841 .1868 1133 .0323 .0065 
10 .9038 .7189 5992 4652 3328 2174 1288 0689 0143 0019 
12 8996 .6947 5636 .4207 2852 1738 .0944 0454 .0072 .0007 
14 8961 .6750 5353 3866 .2505 1439 .0726 .0320 0041 .0003 
16 8933 6588 5125 3598 2243 1226 0582 .0238 .0025 .0001 
18 8908 6452 .4936 .3383 2041 .1070 .0482 0185 .0017 .0001 
20 8888 .6336 .4779 3208 1882 0951 .0409 .0149 .0012 .0000 
22 8870 6238 .4646 3062 1753 0858 0355 .0123 .0009 .0000 
24 8854 6152 4532 .2940 .1647 0785 .0314 .0105 .0007 .0000 
26 .8840 .6077 4433 .2836 1559 0725 .0282 0091 .0005 .0000 
28 8828 6011 4347 2747 1485 .0676 .0256 .0080 .0004 .0000 
30 8817 5952 4272 .2669 1421 .0634 .0234 .007 1 .0003 .0000 
40 .8776 5737 .3998 .2396 1206 0501 .0170 .0046 .0002 .0000 
60 .8730 5504 3713 2124 1005 .0387 O19 .0029 .0001 .0000 
120 .8679 5253 .3417 1857 0821 .0290 .0080 .0017 0000 .0000 
oe) .8622 4983 3112 1597 .0656 0211 .0052 .0010 .0000 .0000 
yy = 8 
2 .9365 8971 .8747 .8490 8203 7889 .7553 .7198 .6448 5671 
4 9274 8374 7817 .7149 .6391 5574 4735 3914 .2456 1371 
6 9158 -7869 .7024 .6023 4935 3845 .2839 1981 .0809 0259 
8 .9088 7464 .6398 5173 3910 .2744 .1777 1059 0289 0055 
10 .9033 .7140 5910 .4540 3200 .2049 1184 0615 0118 .0014 
12 8989 .6878 .5526 .4063 .2697 1598 0838 .0387 0055 0005 
14 8951 .6663 5218 3696 .2330 .1292 .0624 .0260 .0029 .0002 
16 .8920 6484 .4968 .3407 .2056 .1077 .0485 .0186 .0017 .0001 
18 8894 .6334 4761 .3176 .1846 0921 .0390 .0139 .0010 .0000 
20 8871 .6205 4588 2988 1680 .0804 .0323 .0108 .0007 .0000 
22 8851 .6095 444] .2832 1548 0713 .0274 .0087 .0005 .0000 
24 8834 5999 4315 .2700 .1439 .0642 .0237 .0072 .0003 .0000 
26 8819 S915 .4206 2589 .1349 0585 .0208 .0060 .0003 .0000 
28 8805 5840 4lll .2493 1274 0538 .0186 0052 .0002 .0000 
30 .8793 5774 .4027 .2410 .1209 .0499 .0168 .0045 .0002 .0000 
40 8746 5530 3725 .2120 .0995 .0377 0114 .0027 .0001 .0000 
60 .8694 5264 .3408 1834 .0798 .0275 .0074 0015 .0000 .0000 
120 .8635 4975 3081 .1556 0623 .0193 .0046 .0008 .0000 .0000 


ove) 8568 .4663 .2745 1292 0472 .0130 .0027 .0004 .0000 .0000 
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TABLE VII (continued) 


V2 


@=.5 


.9366 
9249 
9158 
.9087 
.9029 
8982 
8943 
8909 
8881 
8856 
8835 
8816 
8799 
8784 
8770 
8718 
.8660 
8592 
8514 


9368 
9250 
9158 
9085 
.9026 
8976 
8935 
8899 
8869 
8843 
8819 
8799 
.8780 
8764 
8749 
8692 
8627 
8551 
8462 


.9369 
9252 
9159 
.9082 
9019 
8966 
8921 
8882 
8848 
8818 
8792 
8768 
8747 
8728 

8710 
8643 
8565 
8472 
8359 


1.2 


8756 
7821 
.7007 
6354 
5838 
5428 
5097 
4827 
.4604 
4416 
4257 
4120 
4002 
3898 
.3807 
3477 
3133 
.2778 
2417 


8762 
7825 
6992 
6315 
5774 
5340 
.4990 
4702 
4463 
4262 
.4091 
3944 
3817 
3705 
3607 
3253 
.2884 
.2506 
2124 


8772 
7829 
.6968 
6250 
5666 
5192 
4805 
4485 
4219 
3994 
.3803 
3638 
3496 
3371 
3261 
.2865 
.2456 
2042 
.1632 


Vv} 


Vv] 
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2.0 


1573 
4735 
.2792 
1702 
.1099 
0753 
0543 
.0409 
.0320 
0259 
0214 
0181 
0156 
0137 
0121 
0077 
.0046 
.0026 
0014 


7589 
4734 
2751 
.1638 
1028 
.0683 
0478 
.0350 
0267 
0210 
.0170 
0141 
O119 
0102 
0089 
0053 
0029 
0015 
.0007 


7614 
4731 
.2686 
1536 
0917 
0577 
.0382 
0265 
.0192 
0144 
O11] 
.0088 
.0072 
0059 
.0050 
.0026 
0012 
0005 
.0002 


2.6 


6477 
2452 
0778 
0262 
0100 


0021 
0011 
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TABLE VII (continued ) 


a =0.10 
V2 @=.5 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.6 3.0 
y= 
2 .8582 1443 .6846 .6202 5534 .4863 .4209 3588 .2491 .1628 
4 .8410 .6773 5919 5017 4118 3266 .2500 .1846 .0899 .0375 
6 .8336 .6498 5552 .4570 3613 .2738 .1985 .1373 .0570 .O194 
8 .8296 .6353 5363 .4344 3367 .2490 .1753 .1172 .0447 .0137 
10 .8271 6265 5248 .4209 .3223 .2348 .1623 .1063 .0385 .O11O 
12 8254 .6205 517] .4120 3128 .2256 154] .0996 .0348 .0096 
14 .8242 .6162 5116 .4057 3062 2192 1485 .0949 .0324 .0086 
16 .8232 .6130 5075 4010 3012 2145 .1444 .0916 .0307 .0080 
18 8225 .6104 5043 3973 .2974 .2109 1412 .0891 .0294 .0075 
20 8219 .6084 5017 .3944 .2944 .2081 .1388 .087 I .0285 .0072 
22 8214 .6068 .4996 3920 .2920 .2058 .1368 .0856 .0277 .0069 
24 .8210 .6054 .4979 .3900 .2900 .2038 1351 .0843 .0271 .0067 
26 .8207 .6042 .4964 3884 .2882 .2022 .1338 .0832 .0266 .0065 
28 .8204 .6032 4951 3869 .2868 .2009 .1326 .0823 .0261 .0063 
30 8201 .6023 .4940 .3857 .2855 .1997 1316 .0815 .0257 .0062 
40 8192 5993 .4902 3814 2811 .1956 1281 .0788 0245 .0058 
60 .8183 5962 .4864 3771 .2768 .1916 .1248 .0762 .0233 .0054 
120 8174 .5932 .4826 .3729 .2726 .1877 1215 .0737 .0222 .0050 
oo 8165 5901 .4788 3686 .2683 .1838 1183 .0713 0211 .0047 
yy = 2 
2 .8669 .7746 7252 .6707 .6130 5536 .4939 4355 3265 .2333 
4 .8486 .6981 .6159 5268 4358 3481 .2680 .1987 .0972 .0405 
6 .8392 .6593 5623 .4595 3586 .2662 .1876 .1525 0471 .0140 
8 8335 .6369 5319 .4228 3183 .2260 . 1508 .0944 .0302 .0073 
10 .8298 .6223 5126 .4000 .2940 .2027 .1305 .0783 .0226 .0048 
12 .8272 .6122 .4994 .3846 .2780 .1877 .1179 .0686 .0184 .0035 
14 £8252 .6047 .4897 .3734 .2666 .1772 .1093 .0622 .0158 .0028 
16 .8237 .5990 4823 3651 2581 .1696 .1031 .0578 .0140 .0024 
18 8225 5945 4765 3586 2516 .1638 .0984 .0544 .0128 .002 1 
20 8215 5908 .4719 .3533 .2464 .1592 .0948 .0519 .O119 .0019 
22 .8207 5878 .4680 .3490 .2422 .1555 .0919 .0498 .0012 .0017 
24 .8200 5853 .4648 3455 .2387 .1524 .0895 .0482 .0106 .0016 
26 .8194 5831 .4621 3425 2357 .1498 .0875 .0468 .0102 0015 
28 .8190 5812 .4598 3399 .2332 .1477 .0859 .0457 .0098 .0014 
30 8185 .5796 4577 .3376 .2310 1458 0844 .0447 .0095 .0014 
40 8169 5739 .4506 .3298 .2235 .1394 .0796 0415 .0084 0011 
60 .8153 5681 .4433 .3220 .2160 .1331 .0749 .0384 .0075 .0010 
120 8137 5622 .4361 .3142 .2087 .1270 .0705 .0355 .0067 .0008 
oe) .8120 5562 4288 .3064 .2015 1211 .0662 .0328 .0059 .0007 
Yyy= 
2 .8700 .7858 .7403 .6899 .6359 5799 5231 .4668 3597 .2655 
4 8513 .71047 .6230 5336 4416 .3525 .2709 .2002 .0970 .0399 
6 .8406 6585 5581 4514 3468 .2521 .1729 1117 .0386 .0103 
8 .8338 .6298 .5190 .4040 .2953 .2016 .1281 .0755 .0208 .0041 
10 8291 .6106 .4933 .3738 .2638 .1724 .1038 .0574 .0135 .0022 
12 .8257 5968 .4752 .3531 .2429 .1537 .0890 .0470 .0098 .0014 
14 .8232 5864 .4618 .3380 .2280 .1408 .0792 .0404 .0077 .0010 
16 8211 5784 4516 .3266 .2170 1315 .0723 .0359 .0064 .0007 
18 .8195 5720 .4434 3177 .2085 .1244 .0671 .0326 .0055 .0006 
20 .8182 5668 .4369 .3105 .2017 .1189 .0632 .0301 .0049 .0005 
22 .8170 5624 4314 .3047 .1963 1145 .0601 .0282 .0044 .0004 
24 8161 5588 .4268 .2998 .1917 .1108 .0576 .0267 .0040 .0004 
26 8153 5556 .4230 .2956 .1879 .1078 .0555 .0255 .0038 .0003 
28 8145 5529 .4196 2921 .1847 .1053 .0537 .0244 0035 .0003 
30 8139 5506 .4167 .2890 1819 1031 .0523 .0236 .0033 .0003 
40 8117 5421 .4063 .2782 .1722 .0956 .0473 .0207 .0027 .0002 
60 .8093 5335 .3959 .2674 .1627 0885 .0427 .0182 .0022 .0002 
120 .8069 5246 .3853 .2567 .1535 .0817 .0384 .0159 .0018 .0001 
oo .8044 5156 .3745 .2460 .1444 .O752 .0344 .0138 .0015 .0001 
yy = 4 
2 .8716 .7916 .7482 .6999 .6480 5939 5387 .4837 3781 .2836 
4 8527 1077 .6259 5360 .4432 3532 .2708 .1995 .0958 .0390 
6 .8410 .6559 5527 .4429 3358 .2399 .1610 1012 .0327 .0080 
8 .8333 .6223 .5066 3871 .2758 1821 1110 .0622 0151 .0026 
10 .8279 5989 .4754 .3509 .2388 .1489 .0846 .0436 .0086 0011 
12 8238 5818 .4531 .3257 .2142 .1279 .0689 .0333 .0056 .0006 
14 .8207 5689 .4365 .3074 .1968 .1136 .0587 .0271 .0040 .0004 
16 .8182 5587 .4236 2935 .1839 .1033 .0517 .0229 .0031 .0002 
18 8161 5505 .4133 .2826 .1740 .0957 .0467 .0201 .0025 .0002 
20 .8144 5438 .4050 .2738 .1662 .0898 .0428 .0179 .0021 .0001 
22 .8130 5382 3981 .2666 .1599 .0851 .0398 .0163 .0018 .0001 
24 8118 5334 .3922 .2606 .1547 .0812 .0375 O51 .0016 .0001 
26 .8107 5293 .3872 .2555 .1503 .0781 .0355 .0141 .0014 .0001 
28 .8098 5258 .3829 .2512 .1466 .0754 .0339 .0133 .0013 .0001 
30 .8090 5226 .3792 .2474 1434 .0732 .0326 .0126 .0012 .0001 
40 .8061 S115 3659 .2343 1325 .0655 .0281 .0104 .0009 .0000 
60 .8030 5000 .3524 2211 .1219 .0584 .0241 .0085 .0007 .0000 
120 .7997 4881 3387 .2081 1117 .0518 .0206 .0069 .0005 .0000 


00 .1963 4758 3248 1952 1019 0457 .0174 .0056 .0003 .0000 
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TABLE VII (continued ) 


a =0.10 
V2 o=.5 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.6 3.0 
4 =5 
Z .8726 7952 -7530 .7061 .6555 .6026 5485 4943 3897 .2953 
4 8534 7093 6273 .5369 4435 3528 .2699 1983 0945 0381 
6 8412 6532 5477 .4354 3266 .2300 1516 0933 .0286 .0066 
8 8327 6154 4957 3728 2599 .1668 0982 0528 OLLS 0017 
10 8266 5885 4599 3316 .2187 .1308 0707 0343 0058 0006 
12 8219 5685 4339 3029 1913 1084 .0548 0245 .0034 0003 
14 8183 5532 4144 .2818 .1720 .0934 0448 0188 .0022 0002 
16 8153 5410 399] .2658 1578 .0827 .0380 0152 .0016 .0001 
18 8129 5311 3869 2532 .1470 .0749 .0332 0128 0012 0001 
20 8109 5229 .3770 .2432 1385 0689 0297 0110 0010 0000 
pepo 8092 5161 3687 .2349 .1316 0642 0270 .0097 .0008 .0000 
24 8077 5103 3618 .2280 .1260 .0604 0249 0087 .0007 .0000 
26 8064 5053 3558 woe2 1213 0573 0232 .0080 .0006 .0000 
28 8053 5009 3507 .2173 1174 0547 .0218 0074 0005 0000 
30 8043 4971 3462 .2129 .1140 0525 .0206 .0068 .0004 0000 
40 8007 4833 .3302 .1979 .1024 0452 0169 .0053 .0003 .0000 
60 7968 4689 3140 .1830 .0914 0386 .0137 .0040 .0002 .0000 
120 1927 4539 .2974 1684 .0810 0327 .0109 .0030 000 1 .0000 
oe) .7883 4384 .2807 .1540 .0712 .0274 .0086 .0022 .0001 .0000 
VY =6 
2 8732 1976 .7563 .7103 .6606 6085 5552 5016 3978 .3035 
4 8540 -7102 628] 5373 4434 3522 .2689 197] .0934 .0373 
6 8411 6507 5432 4291 3189 .2220 1442 0872 0256 0056 
8 8321 6094 4864 3609 .2469 1548 0884 0459 .0092 0012 
10 8254 5794 4465 3155 .2023 1168 .0604 0278 0041 .0004 
12 8202 5568 4174 .2837 .1728 0935 .0446 .0187 0022 0001 
14 8161 5392 3952 .2603 1522 .0781 0350 .0136 0013 .0001 
16 8127 252 3779 2426 .1371 .0674 .0286 0104 .0009 .0000 
18 8100 5137 3640 2287 1257 0596 0243 .0084 0006 .0000 
20 .8076 5042 3526 .2176 .1167 0538 0211 .0070 0005 0000 
22 8056 4962 3432 .2085 .1096 0492 .0187 .0060 .0004 0000 
24 8039 4894 .3352 .2009 1038 0456 0169 0052 .0003 .0000 
26 8024 4835 3283 1945 0989 0426 0154 .0046 .0002 .0000 
28 8011 4783 3224 1891 .0949 0402 0142 0042 0002 .0000 
30 £7999 4738 3172 .1844 0914 0382 0133 .0038 0002 .0000 
40 7956 4574 .2989 .1680 .0797 0315 .0103 .0027 000 1 0000 
60 7910 4403 .2802 1519 .0688 0257 0078 .0019 .0001 .0000 
120 £7859 4223 .2612 1362 .0587 0206 0058 .0013 .0000 .0000 
oe) 7805 4035 2420 1210 0494 0162 0042 .0009 0000 0000 
4 = 7. 
2 8737 1994 7587 .7133 .6643 6129 5600 5069 4037 .3095 
4 8543 .7108 6285 .5373 4431 3515 .2680 .1960 0924 .0367 
6 8411 6485 5394 4238 3126 2154 1383 .0825 .0234 0049 
8 8315 6042 4784 3509 .2362 1451 .0808 .0407 .0076 0009 
10 8243 5714 4351 3020 .1890 .1057 0525 0231 0031 0002 
12 8186 5465 4030 .2675 .1578 0819 .0371 .0146 0015 0001 
14 8141 5269 3786 .2423 .1362 .0665 0279 0101 .0008 .0000 
16 .8103 S111 3594 2231 1204 0559 0221 .0074 .0005 .0000 
18 8072 4982 3440 .208 1 1086 0483 0181 0057 0003 0000 
20 8046 4874 3313 .1962 .0995 0427 0153 .0046 0002 .0000 
22 8024 4784 3208 .1864 0922 .0383 0132 .0038 0002 .0000 
24 .8004 .4705 3119 .1783 .0863 .0349 0117 .0032 0001 .0000 
26 1987 4638 3043 715 0815 0322 .0105 .0028 .0001 .0000 
28 7972 4579 2977 .1657 .0774 .0300 0095 .0024 0001 0000 
30 7958 4527 2919 1606 .0740 .0281 0087 .0022 .0001 0000 
40 .7908 4339 .2715 1433 .0625 .0222 .0063 .0014 .0000 .0000 
60 7854 4140 .2507 1264 0520 0172 0045 .0009 0000 .0000 
120 .7794 3932 .2296 1102 .0426 0130 0031 .0006 .0000 .0000 
oo .7728 3712 .2083 .0948 0342 .0096 0021 .0003 .0000 .0000 
4 =8 
2 8740 .8006 -7604 .7156 .6670 -6160 5636 5109 408 1 3140 
4 8546 7112 6287 5373 4427 3509 .2671 .1950 0915 0362 
6 8410 .6466 5361 .4192 3072 .2100 1334 .0786 0216 0043 
8 8310 5996 4716 3423 Pa i 1371 .0747 .0367 .0064 0007 
10 8233 5645 4251 .2904 .1778 .0968 0465 0196 .0023 0002 
12 8172 5373 3906 .2538 1454 0727 0315 .O117 .0010 .0000 
14 8123 5159 3641 .2269 .1230 0574 0228 .0077 .0005 .0000 
16 8082 4986 3432 .2066 .1069 .0470 0174 0054 .0003 .0000 
18 8048 4843 3264 -1907 .0949 .0397 0138 .0040 0002 .0000 
20 8019 4724 3126 1781 0857 .0344 0114 .0031 .0001 .0000 
22 .7994 4622 3012 .1678 .0784 .0303 .0096 .0025 0001 .0000 
24 1972 4535 .2914 1593 .0726 .0272 .0083 .0020 .0001 .0000 
26 7952 .4460 2831 1521 .0678 0247 .0072 .0017 .0000 .0000 
28 .7935 4394 .2759 .1460 .0638 .0226 .0065 0015 .0000 .0000 
30 .7920 4335 .2696 .1408 .0604 .0210 .0058 .0013 .0000 .0000 
40 .7863 4123 .2473 1228 .0494 0158 0040 .0008 0000 0000 
60 -7801 3899 .2247 .1056 .0395 0116 .0026 0004 .0000 .0000 
120 773) 3663 .2019 .0893 .0309 .0082 .0016 0002 .0000 .0000 


oe) 7653 3414 1791 .0740 .0235 0056 .0010 .0001 .0000 .0000 
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a =0.10 
V2 oe=.5 1.0 1.2 1.4 1.6 1.8 2.0 2:2 2.6 3.0 
y= 9 
2 8743 8017 7619 7174 .6693 .6186 .5666 5141 A117 3177 
4 8548 7115 6288 5372 4423 3503 .2663 .1942 .0908 0357 
6 .8409 .6449 5333 4153 .3027 .2055 .1294 0755 .0202 .0039 
8 8305 5956 4656 3350 .2198 .1305 .0698 .0335 .0055 .0006 
10 8224 5583 4165 .2805 .1685 .0894 .0417 .0170 .OO19 .0001 
12 8160 5293 3797 .2420 .1350 .0653 .0271 .0096 .0007 .0000 
14 8107 5061 3513 .2137 L121 0501 .0189 .0060 .0003 .0000 
16 8063 A873 3290 .1924 0958 .0401 .0139 .0040 .0002 0000 
18 8025 A718 3109 .1758 .0837 .0331 .0107 .0028 0001 .0000 
20 .7994 A588 .2962 .1627 0745 0281 .0086 .0021 .0001 .0000 
22 .1966 4476 .2838 .1520 .0673 0243 .0071 .0016 .0000 .0000 
24 1942 4381 .2734 1432 .0616 0214 .0059 .0013 .0000 .0000 
26 7921 4298 .2644 1358 .0569 .O191 .005 1 0011 -0000 0000 
28 .7902 4225 .2567 .1295 .0530 .0173 .0045 .0009 .0000 .0000 
30 7885 Al61 .2500 .1241 .0498 0159 .0040 .0008 .0000 .0000 
40 7821 3927 2261 1058 .0393 .O114 .0025 .0004 .0000 .0000 
60 .7750 3678 .2019 0885 .0302 .0078 0015 .0002 .0000 -.0000 
120 7671 3415 1777 .0724 .0224 .0052 .0009 .0001 -.0000 .0000 
oe) 7581 3138 -1537 .0577 .0161 .0033 0005 .0000 .0000 .0000 
y= 10 
2 8746 8025 .7630 7188 .6710 .6207 5689 5167 4146 3206 
4 8550 W117 .6289 .5370 4419 3497 .2656 1935 .0902 0353 
6 8408 6434 5308 A119 .2988 .2016 .1260 .0728 .O191 .0036 
8 8301 5921 4604 3287 .2133 1250 .0657 .0309 .0049 .0005 
10 8216 5528 A088 2719 .1605 .0834 .0378 0149 0015 .0001 
12 8148 5220 .3700 2317 .1263 .0592 .0237 .008 | -.0006 .0000 
14 8092 4974 3401 .2023 .1030 .0443 .0160 .0048 .0002 .0000 
16 8045 A772 3164 .1802 .0866 .0346 O114 .0031 0001 .0000 
18 8005 4605 2972 .1630 .0745 0279 .0085 .0021 0001 .0000 
20 7971 4464 2815 .1494 .0654 0232 .0066 0015 .0000 0000 
22 7941 4344 .2684 1384 0583 .0197 .0053 OO .0000 .0000 
24 7914 4240 .2573 .1294 0527 0171 .0044 .0009 .0000 .0000 
26 7891 4150 .2479 .1219 .0482 O15] .0037 .0007 .0000 .0000 
28 7870 4071 .2397 LISS 0445 0134 .0031 .0006 .0000 0000 
30 7852 4001 .2326 1101 0414 0121 .0027 .0005 .0000 .0000 
40 7782 3746 .2073 .0916 0315 .0083 .0016 .0002 .0000 .0000 
60 .7703 3475 1819 .0745 .0232 .0054 .0009 .0001 .0000 .0000 
120 7613 3186 .1566 .0588 .0163 .0033 .0005 .0000 .0000 .0000 
oe) 7510 2883 1319 .0449 O110 .0019 .0002 .0000 .0000 .0000 
y= 12 

2 8749 8037 .7646 7210 .6736 .6237 5723 5204 4188 3250 
4 8552 .7120 .6289 5368 4413 3488 .2645 .1923 .0893 .0348 
6 8406 .6409 5267 4064 .2925 .1954 .1207 .0688 .0174 .0032 
8 8293 5862 S17 3182 .2029 1161 .0594 .0270 .0039 .0003 
10 8203 5436 3961 2577 1477 0739 .0320 .0120 OO11 0001 
12 8129 .5097 3538 .2149 1123 .0500 .0187 .0059 .0003 .0000 
14 8067 4823 3210 .1836 .0887 .0356 .O118 .0032 0001 .0000 
16 8014 A597 .2950 .1603 .0722 .0266 .0079 0019 .0001 .0000 
18 .1969 .4409 .2740 1423 .0603 .0206 .0056 0012 .0000 .0000 
20 .7930 4249 2568 1281 .OS15 .0164 .0041 .0008 .0000 .0000 
22 7896 AlI3 2424 -1167 .0448 0135 .003 1 .0006 .0000 .0000 
24 .7865 3994 .2303 .1074 -0396 0113 .0025 -.0004 .0000 .0000 
26 7838 3892 .2199 .0998 .0354 .0096 .0020 .0003 .0000 .0000 
28 -7814 3801 2110 .0933 .0320 .0083 .0016 .0002 -0000 .0000 
30 -7792 3721 .2032 .0879 .0292 .0073 .0014 .0002 .0000 .0000 
40 -7709 3427 .1759 .0697 .0207 0045 -.0007 0001 .0000 .0000 
60 7614 3114 .1486 .0532 .0139 .0026 .0003 -.0000 .0000 .0000 
120 -7503 2781 1220 .0389 -.0087 .0013 .0001 .0000 .0000 .0000 


ore) .1373 2431 0967 .0270 .005 1 .0006 0000 .0000 .0000 .0000 


From M. L. Tiku, “Tables of the Power of the F-Test,” Journal of the American 
Statistical Association, 62 (1967), 525-539 and M. L. Tiku, “More Tables of 
the Power of the F-test.” Journal of the American Statistical Association, 67 
(1972), 709-710. Abridged and adapted by permission. 
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Table VII. Power Values and Optimum Number of Levels 
for Total Number of Observations in the One-Way Random 
Effects Analysis of Variance F Test 


This table gives power estimates in the one-way random effects analysis of 
variance F test for specified values of 9 (the value of o2 /o2 under Ho), @ 
(the value of oe i, o? under H,), N = an (total number of observations), a (the 
number of treatment groups or levels), and @ (the level of significance). For 
example, in a one-way random effects analysis of variance, consider the simple 
hypothesis that there are no treatment effects (99 = 0) and the researcher wants 
to reject the null hypothesis if o7/o? is as large as 1.0 (0 = 1.0) ata = 0.05. 
For 20 treatment groups with 5 subjects per group (a = 20,n = 5, N = 100), 
the power of the test 1s equal to 0.998. 
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a = 0.01 
8) = 0.00 
nN 8 0.2 0.4 0.6 0.8 1.0 2.0 3.0 4.0 
10 2, .045 2, .089 Zt 32 2 lhZ 2, .208 2, 341 2, .426 2, .485 
20 2,.114 2, .214 4, .302 4, .394 4, .471 5, .706 5, .822 5, .881 
30 3, .180 3, 348 5, .474 5, .586 5, .668 6, .875 10, .948 10, .977 
40 3, .246 4, .463 5, .615 5, .716 8, .795 10, .954 10, .986 13, .995 
50 3, 306 5, 561 5, .708 7, 809 10, .878 10, .981 16, .996 16, .999 
60 4, .383 6, .644 6, .788 10, .881 10, .930 15, .994 20, .999 12, 1.00 
70 5, .439 7, 713 10, .852 10, .924 14, .960 14, .998 14, 1.00 10, 1.00 
80 5, .497 8, .770 10, .896 10, .950 16, .977 20, .999 11, 1.00 9, 1.00 
90 5, 547 9, 817 10, .925 15, .970 18, .988 15, 1.00 10, 1.00 8, 1.00 
100 5, 591 10, .855 11, .946 20, .980 20, .993 14, 1.00 9, 1.00 8, 1.00 
300 15, .965 30, .999 8, 1.00 7, 1.00 6, 1.00 6, 1.00 6, 1.00 5, 1.00 
500 15, 1.00 8, 1.00 7, 1.00 7, 1.00 6, 1.00 6, 1.00 6, 1.00 5, 1.00 
8) = 0.10 
n\ @ 0.3 0.5 0.7 0.9 1.0 2.0 3.0 4.0 
10 2, .032 2, .059 2, 089 2, .118 2 al3e 2, .250 2, .334 2, 396 
20 2, 057 4, .120 4, .194 4, .267 4, .302 5, .567 5, .718 5, .804 
30 3, .082 5, .188 5, 302 6, .405 6, .454 6, .751 10, 891 10, .948 
40 4, .106 J, 2202 8, .395 8, 527 8, .582 10, .878 10, .958 13, .984 
50 5, .130 7, 309 10, .484 10, .629 10, .686 10, .931 16, .984 16, .996 
60 6, .153 10, .371 10, .569 12, .713 12, .768 15, .971 20, .996 20, .999 
70 7, .176 10, .430 14, .636 14, .781 14, .830 23, .985 23, .999 17, 1.00 
80 8, .199 10, .479 16, .697 16, .834 16, .877 20, .994 26, 1.00 15, 1.00 
90 10, .223 15, .530 15, .751 18, .876 18, .912 30, .997 18, 1.00 14, 1.00 
100 10, .245 14, 569 20, .795 20, .907 20, .978 25, .999 16, 1.00 12, 1.00 
300 30, .632 50, .967 60, .998 50, 1.00 30, 1.00 17, 1.00 10, 1.00 8, 1.00 
500 62, .850 50, .998 22, .100 16, 1.00 15, 1.00 10, 1.00 8, 1.00 8, 1.00 
00 = 0.50 
n\2 0.7 0.8 0.9 1.0 2.0 3.0 4.0 5.0 
10 2, .018 2, .023 2, .028 2, -034 2, .095 2, 155 2, .208 $4299 
20 4, .024 4, .034 4, .045 4, .057 5, .218 5, .380 5, 508 5, .604 
30 5, .029 6, .043 6, .060 6, .080 10, .329 10, .570 10, .730 10, .828 
40 8, .034 8, .053 8, .076 8, .102 10, .445 13, .344 13, .847 13, .919 
50 10, .039 10, .062 10, .091 10, .125 16, .526 16, .799 16, .916 16, .963 
60 12, .043 12, .071 12, .107 15, .149 20, .634 20, .885 20, .964 20, .988 
70 14, .048 14, .081 14, .122 14, .171 23, .703 23, .926 23, .982 23, .995 
80 16, .052 16, .090 20, .139 20, .197 20, .762 26, .953 26, 991 40, .998 
90 18, .056 18, .099 18, .154 18, .218 30, .823 30, .975 30, .996 45, .999 
100 20, .061 20, .109 25, .175 25, .246 33, .856 33, .985 33, .998 30, 1.00 
500 125, .255 125, 511 125, .739 125, .884 68, 1.00 20, 1.00 15, 1.00 13, 1.00 
1000 250, .503 250, .828 333, .966 190, 1.00 35, 1.00 20, 1.00 15, 1.00 13, 1.00 
8) = 1.00 
n\2 1.2 1.4 1.6 1.8 2.0 3.0 4.0 5.0 
10 2, 015 2, .020 2, .025 2, 032 2, .038 2, 074 2,111 2, 146 
20 4, 017 5, .027 5, .039 5, 054 5, .070 5, 166 5, .270 5, 365 
30 6, .020 6, .034 6, .052 6, .073 10, .098 10, .260 10, .429 10, 570 
40 10, .022 10, .040 10, .065 10, .096 10, .132 10, .344 13, 550 13, .702 
50 10, .024 10, .046 10, .076 12, .113 12, .157 16, .425 16, .653 16, .799 
60 15, .026 15, .053 15, 091 15, .140 20, .198 20, .525 20, .760 20, .885 
70 14, .027 17, .058 23, .102 23, .161 23,229 23, 592 23, .821 23, .926 
80 20, .029 20, .065 20, .118 20, .185 20, .262 26, .651 26, .869 40, .955 
90 22, .031 30, .071 30, .132 30, .211 30, .302 30, .721 30, .914 30, .975 
100 25, .033 25, .078 25, .145 33, .233 33, .333 33, .765 33, .938 50, .985 
500 166, .103 166, .365 166, .678 166, .882 166, .967 94, 1.00 34, 1.00 24, 1.00 
1000 333, .202 333, .677 333, .994 250, 1.00 142, 1.00 52, 1.00 34, 1.00 24, 1.00 
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a = 0.05 
85 = 0.00 
nN Ad 0.2 0.4 0.6 0.8 1.0 2.0 3.0 4.0 
10 2, .142 2, .220 2, .282 22393 2, .374 3. .518 3, .622 5, .693 
20 2, .241 4, .386 4, .507 4, 596 4, .662 5, .847 5, 914 5, .945 
30 3, 342 5, .530 5, .666 6. .756 6, .818 10, .949 10, .984 10, .994 
40 4, .424 5. .645 8.768 8. .854 8, .904 10.984 13, .996 13, .999 
50 5.495 5.724 10, .843 10. .914 10, .950 16, .994 16, .999 12, 1.00 
60 5, 564 6, .792 10, .900 12, .950 12, .974 20. .999 12, 1.00 10. 1.00 
70 5, 621 7, .843 10, .934 14, .971 14, .987 23. 1.00 10, 1.00 8, 1.00 
&0 5. .668 10, .883 10, .955 16, .983 16, .993 13, 1.00 9, 1.00 8, 1.00 
90 6, .715 10, .913 15, .965 18, .990 18, .997 11, 1.00 8. 1.00 7. 1.00 
100 7, .746 10, .933 14, .980 20, .995 20, .998 10, 1.00 8. 1.00 7, 1.00 
300 20, .989 23. 1.00 8, 1.00 7, 1.00 6, 1.00 5. 1.00 5. 1.00 5, 1.00 
500 10. 1.00 7, 1.00 7, 1.00 6. 1.00 6, 1.00 5. 1.00 5. 1.00 5, 1.00 
Ao = 0.10 
n\ Q0 0.3 0.5 0.7 0.9 1.0 2.0 3.0 4.0 
10 2, .112 2. .169 2. .220 25-203 2, .282 3, .436 3, 547 5, .628 
20 4, .163 5; .283 4. .386 5, .473 5, .515 5. .753 5, .854 10, .907 
30 5, .211 5, .377 6, .513 6. .618 6, .661 10, .884 10, .962 10, .984 
40 5, .255 8, .445 8, .616 8. .727 10, .771 10, .952 13, .988 13, .996 
50 7, .290 10, .526 10, .698 10, .806 10, .843 16, .978 16, .996 16, .999 
60 10, .325 10, .592 12, .764 15. .863 15, .897 20. .993 20, .999 14, 1.00 
70 10, .364 10, .644 14, .816 14, .904 14, .930 23, .997 17, 1.00 13. 1.00 
80 10, .397 16, .692 16, .857 20. .934 20, .955 26, .999 15, 1.00 11, 1.00 
90 10, .426 15, .738 18, .890 18, .954 18, .969 30. 1.00 14, 1.00 10, 1.00 
100 11, .453 20, .771 20, .915 25, .969 25, .981 20, 1.00 12. 1.00 10, 1.00 
300 37, .821 50, .991 48, 1.00 26. 1.00 21, 1.00 15, 1.00 8, 1.00 7. 1.00 
500 71, .949 29, 1.00 17, 1.00 13, 1.00 12, 1.00 9. 1.00 8, 1.00 7. 1.00 
69 = 0.50 
n\?2 0.7 0.8 0.9 1.0 2.0 3.0 4.0 5.0 
10 2, .076 2, .090 2, .103 2, .116 3, .239 3, .343 5, .429 5, .509 
20 5, .095 eee Wal 5, .147 5, .175 5, .429 5, 601 10, .720 10, .809 
30 6, .109 6, .144 6, .181 6, .218 10, .578 10, .784 10, .884 10, .934 
40 8, .122 10, .167 10, .215 10, .265 13, .677 13, .870 20, .945 20, .978 
50 10, .134 10, .186 10, .242 10, .300 16, .755 16, .924 25, .977 25, .993 
60 15, .145 15, .207 15, .275 15, .344 20, .832 20, .963 20, .991 30, .998 
70 14, .155 14, .224 14, .298 17, .373 23, .875 23, .979 35, .996 35, .999 
&0 20, .166 20, .245 20, .330 20, .415 26, .907 26, .988 40, .999 24, 1.00 
90 18, .175 18, .260 22, .351 30, .442 30, .938 30, .994 45, .999 21, 1.00 
100 25, .186 25, .281 25, .381 25, .480 33, .955 33, .997 30, 1.00 20, 1.00 
500 125, .499 125, .749 125, .899 166, .967 27, 1.00 16, 1.00 13, 1.00 11, 1.00 
1000 250, .745 333, .944 237, 1.00 115, 1.00 27, 1.00 16, 1.00 13, 1.00 11, 1.00 
69 = 1.00. 
n\2 1.2 1.4 1.6 1.8 2.0 3.0 4.0 5.0 
10 2, .065 2, .081 2, .096 3, .112 3, .129 3, .209 3, .281 5, .350 
20 5, .076 5, .105 5, .137 5, .169 5, .203 5, .361 10, .491 10, .610 
30 10, .083 10, .123 10, .168 10, .216 10, .267 10, .501 10, .672 10, .784 
40 10, .090 10, .139 10, .195 10, .255 13, .317 13, .595 13, .771 20, .878 
50 12, .094 16, .152 16, .219 16, .29] 16, .365 16, .672 25, .848 25, .935 
60 15, .101 20, .170 20, .251 20, .338 20, .424 20, .756 20, .906 30, .966 
70 23, .106 23, .183 23, .274 23, .371 23, .466 23, .805 35, .938 35, .982 
80 20, .112 20, .196 26, .296 26, .402 26, .505 26, .845 40, .961 40, .991 
90 30, .116 30, .212 30, .325 30, .442 30, .533 30, .887 45, .976 45, .996 
100 25, .121 33, .224 33, .346 33, .471 33, .587 33, .911 50, .985 50, .998 
500 166, .277 166, .625 166, .868 166, .967 166, .993 77, 1.00 25, 1.00 19, 1.00 
1000 333, .438 333, .867 333, .992 142, 1.00 100, 1.00 41, 1.00 25, 1.00 19, 1.00 
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10 2.229 
20 4, .331 
30 3, 444 
40 4, .531 
50 5, .602 
60 5, .662 
70 7,711 
80 8, .753 
90 6, .791 
100 9, 821 
300 20, .994 
500 9, 1.00 


10 2, .188 
20 4, .258 
30 5, .316 
40 5, .363 
50 7, 406 
60 10, .448 
70 10, .487 
80 10, .520 
90 15, .550 
100 14, .577 
300 50, .892 
500 71, 975 


10 2; 
20 3; 
30 6, 
40 10, 
50 10, 
60 15, 
70 14, 
80 20, 
90 18, 
100 25, 
500 125, 
1000 250, 


10 3, 
20 35 
30 10, 
40 10, 
50 16, 
60 20, 
70 23, 
80 20, 
90 30, . 
100 33, 
500 166, 
1000 333, 


From R. S. Barcikowski, 


124 
141 
152 
162 
171 
.180 
.187 
194 


202 


.209 
.408 
580 


1. 


16, 
20, 
23; 
26, 
30, 
33, 
166, 
333, 


4 


149 
. 184 
211 
.232 
.253 
.275 
292 
.308 
.328 
343 
.749 
.928 


1.6 


Bi 

5, 
10, 
13, 
16, 
20, 
23; 
26, 
30, 


35,3 
929 


166, 


.173 
227 
272 
.305 
.340 
375 
402 
A27 
459 


482 


233, 1.00 


a= 0.10 
A) = 0.00 
0.8 1.0 
2, .430 2, .470 
5, .694 5, .755 
6, .830 6, .877 
8, .906 10, .941 
10, .948 10, .971 
12, .971 15, .986 
14, .984 14, .993 
16, .991 20, .997 
18, .995 18, .998 
20, .998 20, .999 
7, 1.00 6, 1.00 
6, 1.00 6, 1.00 
Ay = 0.10 
0.9 1.0 
3, .360 3, 385 
5, 592 5, .628 
6, .720 6, .754 
10, .816 10, .850 
10, .873 10, .900 
15, 918 15, .940 
14, .943 14, .960 
20, .964 20, .977 
18, .975 22, .984 
25, .984 25, .991 
23, 1.00 21, 1.00 
12, 1.00 11, 1.00 
Ao = 0.50 
1.0 2.0 
3, .196 3, .355 
5, .276 5, 551 
10, .333 10, .700 
10, .386 13, .784 
12, .423 16, .845 
15, .474 20, .901 
23, .507 23, .930 
20, .548 26, .950 
30, .580 30, .969 
25, 612 33, .978 
166, .985 23, 1.00 
93, 1.00 23, 1.00 
89 = 1.00 
1.8 2.0 
3, .196 3, .219 
5, .269 5, 310 
10, .333 10, .392 
13, .379 13, .449 
16, .426 16, .506 
20, .472 20, .561 
23, .507 23, .602 
26, .540 26, .639 
30, .580 30, .683 
33, .608 33, .713 
166, .985 166, .997 
116, 1.00 82, 1.00 


3.0 


5, .737 
5, .944 
10, .992 
13, .998 
16, 1.00 
10, 1.00 
9, 1.00 
8, 1.00 
8, 1.00 
7, 1.00 
5, 1.00 
5, 1.00 
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4.0 


5, .808 
10, .972 
10, .997 
13, 1.00 
10, 1.00 

8, 1.00 

7, 1.00 

7, 1.00 

6, 1.00 

6, 1.00 

5, 1.00 

5, 1.00 


4.0 


5, .758 
10, .953 
10, .992 
20, .998 
25, 1.00 
12, 1.00 
10, 1.00 
10, 1.00 

9, 1.00 

9, 1.00 

7, 1.00 

7, 1.00 


5.0 


5, .657 
10, .892 
15, .968 
20, .991 
25, .997 
30, .999 
23, 1.00 
20, 1.00 
18, 1.00 
20, 1.00 
10, 1.00 
10, 1.00 


5.0 


5, .502 
10, .743 
15, .871 
20, .936 
25, .969 
30, .985 
35, .993 
40, .997 
45, .999 
50, .999 
17, 1.00 
17, 1.00 


“Optimum Sample Size and Number of Levels in a 


One-Way Random Effects Analysis of Variance,” The Journal of Experimental 
Education, 41 (1973), 10-16. Reprinted by permission. 


634 The Analysis of Variance 


Table IX. Minimum Sample Size per Treatment Group Needed 
for a Given Value of p, a, 1 — @, and Effect Size (C) in Sigma 
Units 


This table gives the minimum sample size per treatment group needed in 
the one-way fixed effects analysis of variance design corresponding to a = 
0.10, 0.05, 0.01; 1— B = 0.7, 0.8, 0.9, 0.95; C = A/o, = 1.0 (0.25) 2 (0.5) 3; 
and p = 2 (1) 11, 13. Here, A designates the magnitude of the difference be- 
tween any pair of treatment groups that is meaningful to detect with probability 
of at least 1 — 8. For example, in a one-way fixed effects analysis of variance 
design, for p = 3,a@ = 0.05, 1 — B = 0.8, and C = 1.0, the required sample 
size per treatment group is 21. 


1—B = 0.70 1—B = 0.80 
ra C 
p a 1.00 1.25 1.50 1.75 2.00 2.50 3.00 1.00 1.25 1.50 1.75 2.00 2.50 3.00 
2 10 U7 6 4 4 3.3 49 7 5 4 3 3 
0 4 9 #7 6 5 4 3 7 12 9 7 6 4 4 
Ol 21 15 11 9 7 5 5 2% 17 «+13 ~~ «10 8 6 5 
3 10 13 9 5 4 3 3 7 11 8 5 4 3 
0 17 #11 8 7 5 4 3 21 14 10 8 6 5 4 
01 25 17 «+12 «10 8 6 § 30 200«d@#sssa 9 7 5 
4 10 15 100 #7 6 5 4 3 9 13 #9 7 6 4 3 
0 19 13 9 7 6 4 4 23 #15 7 #5 4 
01 28 19 13 =~ «10 8 6 5 33 22 «16 «120«21002=COeTsti‘CS 
5 10 17 8 6 5 4 3 21 14 10 8 6 4 4 
0 21 14 «+10 ~~ 8 6 5 4 2 17 «12 7 5 4 
01 30 2 14 «11 9 6 5 35 23 #17 «+13 «10 =#«7~ «6 
6 10 18 12 9 7 5 4 3 22 #15 «ol 8 5 4 
0 22 #15 #11 8 7 5 4 27 «18 «#613~—«10 8 6 4 
01 32 21 #15 #12 9 #7 = § 33 25 #18 13 «+ 8 6 
7 10 19 13 9 7 6 4. 3 24 16 UI 9 5 4 
0 24 16 11 9 7 5 4 2 «+19 14 10 8 6 5 
01 34 22 #1 #12 «10 #7 5 39 2 «18 «#614061 i BkttC 
8 10 20 13 #10 #7 6 4 3 25 16 12 9 7 5 4 
0 2 146 #12 #9 #7 +5 4 30 20 0«14@ ss 6 5 
Ol 35 2 17 «+13 #10 #7 ~~ 5 41 27 19 15 12 8 6 
9 10 21 14 #10 8 4 4 2% 17 12 9 7 5 4 
0 2 17° «12 8 5 4 3121 0«215~=CsoFT 6 5 
Ol 37 #24 #17 ~«130«61000¢~«Sé~C«CS 43 28 #20 15 12 8 6 
10 10 22 14 «+10 ~~ 8 5 4 27 «+18 #13 ~©10 8 5 4 
05 27 +18 «13 «10 8 6 4 33 21 «15012 6 5 
01 38 2 18 #14 #1 7+ 6 4429 21 #16 #12 «8 «6 
11 10 23 #15 8 5 4 28 #18 +13 10 8 6 4 
05 28 19 413 ~~ «10 6 4 a4 22> 62 7 5 
01 39 2 18 #14 «11° °=«8~«6 46 30 21 #16 #13 9 ~=«7 
13.10 24 16 £11 9 7 5 4 30 200«dAsiaa 8 6 4 
0 30 2 #14 «1 9 6 5 36 240=C«=«aITs'sia2Bs’—iad‘Os—issCS 
01 42 27 #19 #15 #12 48 6 49 32 2 #17 #13 9 ~=«~7 
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Table IX (continued) 


1—B = 0.90 1—68 = 0.95 
C Cc 
p a 1.00 1.25 1.50 1.75 2.00 2.50 3.00 1.00 1.25 1.50 1.75 2.00 2.50 3.00 


01 52 34 24 18 14 10 60 39 28 21 16 11 
10.10 3523 16 12 10 7 42 27 19 15 11 8 
05 41 27 19 14 1] 8 48 31 a2 17 13 9 
01 54 35 25 19 15 10 62 40 29 21 17 11 
11 .10 360-23 17 13 10 7 43 28 20 15 12 8 
05 42 28 20 15 12 8 50 33 23 17 14 9 
01 55 36 26 19 15 10 64 42 29 22 17 12 


46 30 21 16 12 8 
53 34 24 18 14 10 
68 44 31 23 18 12 


13.10 38 = 25 18 13 11 7 
05 45 29 21 16 12 8 
01 59-38 27 20 16 11 


2 «10 18 12 9 7 6 4 3 23 15 11 8 7 5 4 
.05 23° «15 11 8 7 5 4 27~—s ‘18 13 10 8 6 5 
01 32.2] 15 es 10 7 6 38 = 25 18 14 11 8 6 

3.10 22 15 1] 8 7 5 4 27 —s 18 13 10 8 6 4 
05 27 ~=—s 18 13 10 8 6 5 32;. -2) 15 12 9 7 5 
01 37 = 24 18 13 11 8 6 43 29 20 16 12 9 7 

4 .10 25 =16 12 9 7 5 4 30 = 20 14 11 9 6 5 
.O5 30 = 20 14 11 9 6 5 36 0— 23 17 13 10 7 5 
01 40 27 19 15 12 8 6 47 31 22 17 13 9 7 

5 .10 27 = «18 13 10 8 5 4 oo) 15 12 9 6 5 
05 32.21 15 12 9 6 5 39-25 18 14 11 7 6 
01 43 28 20 15 12 9 7 51 33 23 18 14 10 7 

6 .10 29 «19 14 10 8 6 4 35,23 16 12 10 7 5 
05 34-23 16 12 10 i 5 41 27 19 14 11 8 6 
O01 46 30 21 16 13 9 7 53-35 25 19 15 10 8 

7 ~~ 10 31 = =20 14 11 9 6 5 37 24 17 13 10 5 
05 36 24 17 13 10 7 5 43 28 20 15 12 8 6 
01 48 31 22 17 13 9 7 56 36 26 19 15 10 +8 

8  =.10 32° 2) 15 11 9 6 5 39,25 18 14 11 7 5 
05 38 = 25 18 13 1] 7 6 45 29 21 16 12 8 6 
O01 50 =. 33 23 17 14 9 7 58 =. 38 27 20 16 ll 8 

9  .10 33, 22 16 12 9 6 5 40 26 19 14 11 8 6 
05 40 26 18 14 11 8 6 47 30 22 16 13 9 6 

7 8 
5 6 
6 7 
a 8 
5 6 
6 7 
8 9 
5 6 
6 7 
8 9 


From T. L. Bratcher, A. M. Moran, and W. J. Zimmer, “Tables of Sample Sizes 
in the Analysis of Variance,” Journal of Quality Technology, 2 (1970), 156— 
164. Abridged and adapted by permission. The adaptation is due to R. E. Kirk, 
Experimental Design, Third Edition, © 1995 by Brooks/Cole, Monterey, CA. 
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Table X. Critical Values of the Studentized Range Distribution 


This table gives the critical values of the Studentized range distribution used in 
multiple comparisons. The critical values are designated as g[p, v; 1 — a] cor- 
responding to a given value of a, p as the total number of treatment groups or 7 
as the number of steps between ordered means, and v as the number of degrees 
of freedom for the error. The critical values are given for wa = 0.05, 0.01; p = 
2 (1) 20; and v = 2 (1) 20, 24, 30, 40, 60, 120, co. For example, fora = 
0.05, p = 4, and v = 20, the required critical value is g [4, 20; 0.95] = 3.96. 
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120 


2 
6.08 
14.00 


4.50 
8.26 


3.93 
6.51 


3.64 
5.70 


3.46 
5.24 


3.34 
4.95 


3.26 
4.75 


3.20 
4.60 


3.15 
4.48 


3.11 
4.39 


3.08 
4.32 


3.06 
4.26 


3.03 
4.21 


3.01 
4.17 


3.00 
4.13 


2.98 
4.10 


2.97 
4.07 


2.96 
4.05 


2.95 
4.02 


2.92 
3.96 


2.89 
3.89 


2.86 
3.82 


2.83 
3.76 


2.80 
3.70 


2.77 
3.64 


Number of Means (p) or Number of Steps Between Ordered Means (r) 


3 
8.33 
19.00 


5.91 
10.60 


5.04 
8.12 


4.60 
6.98 


4.34 
6.33 


4.16 
5.92 


4.04 
5.64 


3.95 
5.43 


3.88 
5.27 


3.82 
5.15 


3.77 
5.05 


oa 
4.96 


3.70 
4.89 


3.67 
4.84 


3.65 
4.79 


3.63 
4.74 


3.61 
4.70 


3.59 
4.67 


3.58 
4.64 


3.53 
4.55 


3.49 
4.45 


3.44 


4.37 


3.40 
4.28 


3.36 
4.20 


3.31 
4.12 


4 
9.80 
22.30 


6.82 
12.20 


5.76 
9.17 


3-22 
7.80 


4.90 
7.03 


4.68 
6.54 


4.53 
6.20 


4.41 
5.96 


4.33 
5.77 


4.26 
5.62 


4.20 
5.50 


4.15 
5.40 


4.11 
32 


4.08 
5.25 


4.05 
5.19 


4.02 
5.14 


4.00 
5.09 


3.98 
5.05 


3.96 
5.02 


3.90 
4.91 


3.85 
4.80 


3.79 
4.70 


3.74 
4.59 


3.68 
4.50 


3.63 
4.40 


5 
10.90 
24.70 


7.50 
13.30 


6.29 
9.96 


5.67 
8.42 


5.30 
7.56 


5.06 
7.01 


4.89 
6.62 


4.76 
6.35 


4.65 
6.14 


4.57 
5.97 


4.51 
5.84 


4.45 
5.73 


4.41 
5.63 


4.37 
5.56 


4.33 
5.49 


4.30 
5.43 


4.28 
5.38 


4.25 
5.33 


4.23 
5.29 


4.17 
5.17 


4.10 
5.05 


4.04 
4.93 


3.98 
4.82 


3.92 
4.71 


3.86 
4.60 


6 
11.70 
26.60 


8.04 
14.20 


6.71 
10.60 


6.03 
8.9] 


5.63 
7.97 


5.36 
7.37 


5.17 
6.96 


5.02 
6.66 


4.91 
6.43 


4.82 
6.25 


4.75 
6.10 


4.69 
5.98 


4.64 
5.88 


4.59 
5.80 


4.56 
5.72 


4.52 
5.66 


4.49 
5.60 


4.47 
5.55 


4.45 
5.51 


4.37 
5.37 


4.30 
5.24 


4.23 
5.11 


4.16 
4.99 


4.10 
4.87 


4.03 
4.76 


7 
12.40 
28.20 


8.48 
15.00 


7.05 
11.10 


6.33 
9.32 


5.90 
8.32 


5.61 
7.68 


5.40 
7.24 


5.24 
6.91 


5.12 
6.67 


5.03 
6.48 


4.95 
6.32 


4.88 
6.19 


4.83 
6.08 


4.78 
5.99 


4.74 
5.92 


4.70 
5.85 


4.67 
5.79 


4.65 
5.73 


4.62 
5.69 


4.54 
5.54 


4.46 
5.40 


4.39 
5.26 


4.31 
5.13 


4.24 
5.01 


4.17 
4.88 


8 
13.00 
29.50 


8.85 
15.60 


7.35 
11.50 


6.58 
9.67 


6.12 
8.61 


5.82 
7.94 


5.60 
7.47 


5.43 
7.13 


5.30 
6.87 


5.20 
6.67 


S12 
6.51 


5.05 
6.37 


4.99 
6.26 


4.94 
6.16 


4.90 
6.08 


4.86 
6.01 


4.82 
5.94 


4.79 
5.89 


4.77 
5.84 


4.68 
5.69 


4.60 
5.54 


4.52 
5.39 


4.44 
5.25 


4.36 
5.12 


4.29 
4.99 


9 
13.50 
30.70 


9.18 
16.20 


7.60 
11.90 


6.80 
9.97 


6.32 
8.87 


6.00 
8.17 


5.77 
7.68 


5.59 
7.33 


5.46 
7.05 


5.35 
6.84 


5.27 
6.67 


5.19 
6.53 


5.13 
6.41 


5.08 
6.31 


5.03 
6.22 


4.99 
6.15 


4.96 
6.08 


4.92 
6.02 


4.90 
5,97 


4.81 
5.81 


4.72 
5.65 


4.63 
5.50 


4.55 
5.36 


4.47 
5.21 


4.39 
5.08 


10 
14.00 
31.70 


9.46 
16.70 


7.83 
12.30 


6.99 
10.24 


6.49 
9.10 


6.16 
8.37 


5.92 
7.86 


5.74 
7.49 


5.60 
7.21 


5.49 
6.99 


5.39 
6.81 


5.32 
6.67 


5.25 
6.54 


5.20 
6.44 


5.15 
6.35 


5.11 
6.27 


5.07 
6.20 


5.04 
6.14 


5.01 
6.09 


4.92 
5.92 


4.82 
5.76 


4.73 
5.60 


4.65 
5.45 


4.56 
5.30 


4.47 
5.16 
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11 
14.40 
32.60 


972 
17.80 


8.03 
12.60 


7.17 
10.48 


6.65 
9.30 


6.30 
8.55 


6.05 
8.03 


5.87 
7.65 


5.72 
7.36 


5.61 
7.13 


5.51 
6.94 


5.43 
6.79 


5.36 
6.66 


5.31 
6.55 


5.26 
6.46 


5.21 
6.38 


5.17 
6.31 


5.14 
6.25 


5.11 
6.19 


5.01 
6.02 


4.92 
5.85 


4.82 
5.69 


4.73 
5.53 


4.64 
5.37 


4.55 
5.23 
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Vv a 
2 05 
01 

3 05 
01 

4 05 
01 

5 05 
01 

6 05 
01 

7 05 
01 

8 05 
01 

9 05 
01 

10 05 
01 

11 05 
01 

12 05 
01 

13 0S 
O01 

14 05 
01 

15 05 
01 

16 OS 
01 

17 .O5 
01 

18 05 
01 

19 05 
01 

20 OS 
01 

24 Ab) 
01 

30 05 
01 

40 05 
01 

60 05 
01 

120 05 
01 

CO 05 
01 


12 
14.70 
33.40 


9.72 
17.50 


8.21 
12.80 


7.32 
10.70 


6.79 
9.48 


6.43 
8.71 


6.18 
8.18 


5.98 
7.78 


5.83 
7.49 


5.71 
7.25 


5.61 
7.06 


5.53 
6.90 


5.46 
6.77 


5.40 
6.66 


5.35 
6.56 
5.31 
6.48 


5.27 
6.41 


323 
6.34 


5.20 
6.28 


5.10 
6.11 


5.00 
5.93 


4.90 
5.76 


4.81 
5.60 


4.71 
5.44 


4.62 
5.29 


Number of Means (p) or Number of Steps Between Ordered Means (r) 


13 
15.10 
34.10 


10.20 
17.90 


8.37 
13.10 


7.47 
10.89 


6.92 
9.65 


6.55 
8.86 


6.29 
8.31 


6.09 
7.91 


5.93 
7.60 


5.81 
7.36 


5.71 
7.17 


5.63 
7.01 


5.55 
6.87 


5.49 
6.76 


5.44 
6.66 


5.39 
6.57 


5.35 
6.50 


5.31 
6.43 


5.28 
6.37 


5.18 
6.19 


5.08 
6.01 


4.98 
5.83 


4.88 
5.67 


4.78 
5.50 


4.68 
5.35 


14 
15.40 
34.80 


10.30 
18.20 


8.52 
13.30 


7.60 
11.08 


7.03 
9.81 


6.66 
9.00 


6.39 
8.44 


6.19 
8.03 


6.03 
7.71 


5.90 
7.46 


5.80 
7.26 


5.71 
7.10 


5.64 
6.96 
5.57 
6.84 


52 
6.74 


5.47 
6.66 


5.43 
6.58 


5.39 
6.51 


5.36 
6.45 


5.25 
6.26 


5.15 
6.08 


5.04 
5.90 


4.94 
5.73 


4.84 
5.56 


4.74 
5.40 


15 
15.70 
35.40 


10.50 
18.50 
8.66 
13.50 
7.72 
11.24 


7.14 
9.95 


6.76 
9.12 


6.48 
8.55 


6.28 
8.13 


6.11 
7.81 


5.98 
7.56 


5.88 
7.36 


3219 
7.19 


5.71 
7.05 


5.65 
6.93 


5.59 
6.82 


5.54 
6.73 


5.50 
6.65 


5.46 
6.58 


5.43 
6.52 


5.32 
6.33 


5.21 
6.14 


5.11 
5.96 


5.00 
5.78 


4.90 
5.61 


4.80 
5.45 


16 
15.90 
36.00 


10.70 
18.80 


8.79 
13.70 


7.83 
11.40 


7.24 
10.08 
6.85 
9.24 


6.57 
8.66 


6.36 
8.23 


6.19 
7.91 


6.06 
7.65 


5.95 
7.44 


5.86 
7.27 


5.79 
7.13 


5.72 
7.00 


5.66 
6.90 


5.61 
6.81 


5.57 
6.73 


5.53 
6.65 


5.49 
6.59 


5.38 
6.39 


5.27 
6.20 


5.16 
6.02 


5.06 
5.84 


4.95 
5.66 


4.85 
5.49 


17 
16.10 
36.50 


10.80 
19.10 


8.91 
13.90 


7.93 
11.55 


7.34 
10.21 


6.94 
9.35 


6.65 
8.76 


6.44 
8.33 


6.27 
7.99 


6.13 
7.73 


6.02 
7.52 
5.93 
7.35 


5.85 
7.20 


5.78 
7.07 


5.73 
6.97 


5.67 
6.87 


5.63 
6.79 


5.59 
6.72 


5.55 
6.65 


5.44 
6.45 


5.33 
6.26 


5.22 
6.07 


5.11 
5.89 


5.00 
5.71 


4.89 
5.54 
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18 
16.40 
37.00 


11.00 
19.30 


9.03 
14.10 


8.03 
11.68 


7.43 
10.32 


7.02 
9.46 


6.73 
8.85 


6.51 
8.41 


6.34 
8.08 


6.20 
7.81 


6.09 
7.59 


5.99 
7.42 


5.91 
7.27 


5.85 
7.14 


5.79 
7.03 


513 
6.94 


5.69 
6.85 


5.65 
6.78 


5.61 
6.71 
5.49 
6.51 


5.38 
6.31 


5.27 
6.12 


5.15 
5.93 


5.04 
5.75 


4.93 
5.57 


19 
16.60 
37.50 


11.10 
19.50 


9.13 
14.20 


8.12 
11.81 


7.51 
10.43 


7.10 
9.55 


6.80 
8.94 


6.58 
8.49 


6.40 
8.15 


6.27 
7.88 


6.15 
7.66 
6.05 
7.48 
5.97 
7.33 
5.90 
7.20 


5.84 
7.09 


5.79 
7.00 


5.74 
6.91 


5.70 
6.84 


5.66 
6.77 


5.55 
6.56 


5.43 
6.36 


5.31 
6.16 


5.20 
5.97 


5.09 
5.79 


4.97 
5.61 


20 
16.80 
37.90 


11.20 
19.80 


9.23 
14.40 


8.21 
11.93 


7.59 
10.54 


7.17 
9.65 


6.87 
9.03 


6.64 
8.57 


6.47 
8.23 


6.33 
7.95 


6.21 
7.73 


6.11 
135 


6.03 
7.39 
5.96 
7.26 


5.90 
7.15 


5.84 
7.05 


5.79 
6.97 


5.75 
6.89 


5.71 
6.82 


Di? 
6.61 


5.47 
6.41 


5.36 
6.21 


5.24 
6.01 


5.13 
5.83 


5.01 
5.65 


From E. S. Pearson and H. O. Hartley, Biometrika Tables for Statisticians, Vol. I, 
Third Edition, © 1970 by Cambridge University Press, Cambridge. Abridged 
and adapted by permission (from Table 29). 
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Table XI. Critical Values of the Dunnett’s Test 


This table gives the critical values of the Dunnett’s test used in comparing 
all treatment means to a control mean. The critical values are designated as 
D [p, v; 1 — a] corresponding to a given value of a, p as the number of treat- 
ment groups excluding the control, and v as the number of degrees of free- 
dom for the error. The critical values are given for one- and two-tailed tests at 
a = 0.05,0.01, p = 1 (1) 9; andv = 5 (1) 20, 24, 30, 40, 60, co. When the 
researcher is comparing all treatment means to a control, the question often 
is whether the treatment is better than the control. In this situation, one-tailed 
critical values should be used. If the researcher wants to test whether the treat- 
ment means are simply different from the control, in either direction, two-tailed 
critical values are more appropriate. 
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One-Tailed Comparison 
Number of Treatment Means, Excluding the Control (p) 


V a 1 2 3 4 5 6 7 8 9 


5 05 2.02 2.44 2.68 2.85 2.98 3.08 3.16 3.24 3.30 
01 3.37 3.90 4.21 4.43 4.60 4.73 4.85 4.94 5.03 

6 05 1.94 2.34 2.56 2.71 2.83 2.92 3.00 3.07 3.12 
01 3.14 3.61 3.88 4.07 4.21 4.33 4.43 4.51 4.59 

7 05 1.89 2.20 2.48 2.62 2.73 2.82 2.89 2.95 3.01 
01 3.00 3.42 3.66 3.83 3.96 4.07 4.15 4.23 4.30 

8 05 1.86 2.22 2.42 2.55 2.66 2.74 2.81 2.87 2.92 
01 2.90 3.29 3.51 3.67 3.79 3.88 3.96 4.03 4.09 

9 .05 1.83 2.18 2.37 2.50 2.60 2.68 2.75 2.81 2.86 
01 2.82 3.19 3.40 3.55 3.66 3.75 3.82 3.89 3.94 

10 05 1.81 2.15 2.34 2.47 2.56 2.64 2.70 2.76 2.81 
01 2.76 3.11 3.31 3.45 3.56 3.64 3.71 3.78 3.83 

11 05 1.80 2.13 2.31 2.44 2.53 2.60 2.67 2AL 2.77 
01 2.12 3.06 3.25 3.38 3.48 3.56 3.63 3.69 3.74 

12 .O5 1.78 2.11 2.29 2.41 2.50 2.58 2.64 2.69 2.74 
01 2.68 3.01 3.19 3:32 3.42 3.50 3.56 3.62 3.67 

13 05 1.77 2.09 22) 2.39 2.48 2.55 2.61 2.66 2.71 
01 2.65 2.97 3.15 3.27 3.37 3.44 3.5] 3.56 3.61 

14 05 1.76 2.08 2.25 PF | 2.46 2.53 2.59 2.64 2.69 
01 2.62 2.94 3.11 3.23 3.32 3.40 3.46 3.51 3.56 

15 05 1.75 2.07 2.24 2.36 2.44 2.51 2.57 2.62 2.67 
01 2.60 2.91 3.08 3.20 3.29 3.36 3.42 3.47 3.52 

16 05 1.75 2.06 2.23 2.34 2.43 2.50 2.56 2.61 2.65 
.O1 2.58 2.88 3.05 3.17 3.26 3.33 3.39 3.44 3.48 

17 05 1.74 2.05 222 2.33 2.42 2.49 2.54 259 2.64 
01 2.57 2.86 3.03 3.14 3.23 3.30 3.36 3.41 3.45 

18 05 1.73 2.05 pee | 2.32 2.41 2.48 2.53 2.58 2.62 
01 2.55 2.84 3.01 3.12 3.21 3.27 3.33 3.38 3.42 

19 05 1.73 2.03 2.20 2.31 2.40 2.47 2.52 2.57 2.61 
01 2.54 2.83 2.99 3.10 3.18 3.25 3.31 3.36 3.40 

20 .05 1.72 2.03 2.19 2.30 2.39 2.46 2.51 2.56 2.60 
01 235 2.81 2.97 3.08 3.17 3.23 3.29 3.34 3.38 

24 .O5 1.71 2.01 2.17 2.28 2.36 2.43 2.48 2.53 2.57 
01 2.49 2.77 2.92 3.03 3.11 3.17 3.22 3.27 3.31 

30 05 1.70 1.99 2.15 25 2.33 2.40 2.45 2.50 2.54 
.O1 2.46 212 2.87 2.97 3.05 3.11 3.16 3.21 3.24 

40 05 1.68 1.97 2.13 2.23 2.31 2.37 2.42 2.47 2.51 
.O1 2.42 2.68 2.82 2.92 2.99 3.05 3.10 3.14 3.18 

60 05 1.67 1.95 2.10 2.21 2.28 2.35 2.39 2.44 2.48 
01 2.39 2.64 2.78 2.87 2.94 3.00 3.04 3.08 3.12 

120 .O5 1.66 1.93 2.08 2.18 2.26 2.32 2.37 2.41 2.45 


01 2.36 2.60 2.73 2.82 2.89 2.94 2.99 3.03 3.06 


oe) 05 1.64 1.92 2.06 2.16 2:23 220 2.34 2.38 2.42 
01 2.33 2.56 2.68 2.77 2.84 2.89 2.93 2.97 3.00 
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Vv a 1 
5 05 2.57 
01 4.03 
6 05 2.45 
01 3.71 
7 05 2.36 
01 3.50 
8 05 2.31 
01 3.36 
9 05 2.26 
01 3.25 
10 05 2.23 
01 3.17 
11 05 2.20 
01 3.11 
12 05 2.18 
01 3.05 
13 05 2.16 
01 3.01 
14 05 2.14 
01 2.98 
15 05 2.13 
01 2.95 
16 05 ZZ 
01 2.92 
17 05 2.11 
01 2.90 
18 05 2.10 
01 2.88 
19 05 2.09 
01 2.86 
20 05 2.09 
01 2.85 
24 05 2.06 
01 2.80 
30 05 2.04 
01 2.75 
40 05 2.02 
01 2.70 
60 05 2.00 
01 2.66 
120 05 1.98 
01 2.62 
e.8) .05 1.96 
01 2.58 


2 


3.03 
4.63 


2.86 
4.21 


2.75 
3.95 
2.67 
3.77 


2.61 
3.63 
2.57 
3.53 
2.53 
3.45 
2.50 
3.39 
2.48 
3.33 
2.46 
3.29 
2.44 
3.25 
2.42 
3.22 
2.41 
3.19 


2.40 
3.17 


2.39 
3.15 


2.38 
3.13 
2.35 
3.07 


2.52, 
3.01 


229 
ZdD 
2.27 
2.90 
2.24 
2.85 


2.21 
2.79 


Number of Treatment Means, Excluding the Control (p) 


3 


3.29 
4.98 


3.10 
4.51 


2.97 
4.21 
2.88 
4.00 


2.81 
3.85 
2.76 
3.74 
212 
3.65 
2.68 
3.58 
2.65 
3.52 
2.63 
3.47 
2.61 
3.43 


2.59 
3.39 


2.58 
3.36 
2.56 
3.33 
2.55 
3.31 
2.54 
3.29 


2.51 
3.22 
2.47 
3.15 
2.44 
3.09 
2.41 
3.03 
2.38 
2.97 


2.35 
2.92 


Two-Tailed Comparison 


4 


3.48 
5:22 


3.26 
4.71 


3.12 
4.39 
3.02 
4.17 


2.95 
4.01 


2.89 
3.88 


2.84 
3.79 
2.81 
Beal 


2.78 
3.65 


2.75 
3.59 
2.73 
3.55 
2.71 
3.51 


2.69 
3.47 


2.68 
3.44 
2.66 
3.42 


2.65 
3.40 


2.61 
S32 
2.58 
3.25 


2.54 
3.19 
2.51 
3:12 
2.47 
3.06 
2.44 
3.00 


5 


3.62 
5.41 


3.39 
4.87 


3.24 
4.53 
3.13 
4.29 


3.05 
4.12 


2.99 
3.99 


2.94 
3.89 
2.90 
3.81 
2.87 
3.74 
2.84 
3.69 
2.82 
3.64 
2.80 
3.60 
2.78 
3.56 
2.76 
3.53 


2.75 
3.50 


213 
3.48 
2.70 
3.40 
2.66 
3.33 


2.62 
3.26 
2.58 
3.19 
2.55 
3.12 


2.51 
3.06 


6 


3.73 
5.56 


3.49 
5.00 


3.33 
4.64 
3.22 
4.40 


3.14 
4.22 


3.07 
4.08 
3.02 
3.98 
2.98 
3.89 
2.94 
3.82 
2.91 
3.76 
2.89 
3.71 
2.87 
3.67 
2.85 
3.63 
2.83 
3.60 
2.81 
3.57 
2.80 
3.55 


2.76 
3.47 


2.72 
3.39 


2.68 
3.32 
2.64 
3.25 
2.60 
3.18 


2.57 
3.11 


7 


3.82 
5.69 


3.57 
5.10 


3.41 
4.74 
3.29 
4.48 


3.20 
4.30 
3.14 
4.16 


3.08 
4.05 
3.04 
3.96 
3.00 
3.89 
2.97 
3.83 
2.95 
3.78 
2.92 
3.73 


2.90 
3.69 


2.89 
3.66 
2.87 
3.63 
2.86 
3.60 
2.81 
o.92 
2.77 
3.44 


2.73 
3.37 
2.69 
3.29 


2.65 
3.22 


2.61 
3.15 


8 


3.90 
5.80 


3.64 
5.20 


3.47 
4.82 
3.35 
4.56 


3.26 
4.37 


3.19 
4.22 


3.14 
4.11 


3.09 
4.02 


3.06 
3.94 
3.02 
3.88 
3.00 
3.83 


29) 
3.78 


2.95 
3.74 
2.94 
3.71 
2.92 
3.68 
2.90 
3.65 
2.86 
3.57 
2.82 
3.49 
2.77 
3.41 


2.73 
3.33 


2.69 
3.26 


2.65 
3.19 
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9 


3.97 
5.89 


3.71 
5.28 


3.53 
4.89 
3.41 
4.62 


3.32 
4.43 


3.24 
4.28 


3.19 
4.16 
3.14 
4.07 
3.10 
3.99 
3.07 
3.93 
3.04 
3.88 
3.02 
3.83 
3.00 
3.79 
2.98 
3.75 


2.96 
3.72 


2.95 
3.69 
2.90 
3.61 
2.86 
3.52 
2.81 
3.44 
2.77 
3.37 
pa 
3.29 


2.69 
3.22 


From C. W. Dunnett, “A Multiple Comparison Procedure for Comparing Several 
Treatments with a Control,” Journal of the American Statistical Association, 50 
(1955), 1096-1121 and C. W. Dunnett, “New Tables for Multiple Comparisons 
with a Control,” Biometrics, 20 (1964), 482—491. Reprinted by permission. 
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Table XII. Critical Values of the Duncan’s Multiple Range Test 


This table gives the critical values of Duncan’s multiple range test which uses 
protection level @ for the collection of all tests. The critical values are designated 
as R[r, v; 1 — a] corresponding to a given level a, the number of means for 
the range being tested or the number of steps apart of two means 1n an ordered 
sequence (r), and the number of degrees of freedom for the error (v). The critical 
values are given for a = 0.05, 0.01;r =2 (1) 10 (2) 20, 50,100; and v= 1 (1) 
20 (2) 30, 40, 60, 100, oo. For example, for a = 0.01, r =3, and v = 13, the 
required critical value is obtained as R[3, 13; 0.99] = 4.48. 
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Table XIII. Critical Values of the Bonferroni t Statistic and Dunn’s 
Multiple Comparison Test 


This table gives the critical values of the Bonferroni ¢ statistic and Dunn’s 
multiple comparison procedure. The critical values are given for a = 0.05, 0.01; 
the number of comparisons p = 1 (1) 10 (5) 20 and the error degrees of freedom 
v =2 (1) 30 (5) 60 (10) 120, 250, 500, 1000, oo. For example, for a = 0.05, 
p =5, and v = 10, the desired critical value is obtained as 3.1693. 
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1 


vy 100(a/p) 5.0000 


nm & WwW bo 


4.3027 
3.1824 
2.7764 
2.5706 


2.4469 
2.3646 
2.3060 
2.2622 
2.2281 


2.2010 
2.1788 
2.1604 
2.1448 
2.1314 


2.1199 
2.1098 
2.1009 
2.0930 
2.0860 


2.0796 
2.0739 
2.0687 
2.0639 
2.0595 


2.0555 
2.0518 
2.0484 
2.0452 
2.0423 


2.0301 
2.0211 
2.0141 
2.0086 
2.0040 


2.0003 
1.9944 
1.9901 
1.9867 
1.9840 


1.9818 
1.9799 
1.9695 
1.9647 
1.9623 
1.9600 


2 
2.5000 


6.2053 
4.1765 
3.4954 
3.1634 


2.9687 
2.8412 
2.7515 
2.6850 
2.6338 


2.5931 
2.5600 
2.5326 
2.5096 
2.4899 


2.4729 
2.4581 
2.4450 
2.4334 
2.4231 


2.4138 
2.4055 
2.3979 
2.3909 
2.3846 


2.3788 
2.3734 
2.3685 
2.3638 
2.3596 


2.3420 
2.3289 
2.3189 
2.3109 
2.3044 


2.2990 
2.2906 
2.2844 
2.2795 
2.2757 


2.2725 
2.2699 
2.2550 
2.2482 
2.2448 
2.2414 


3 
1.6667 


7.6488 
4.8567 
3.9608 
3.5341 


3.2875 
3.1276 
3.0158 
2.9333 
2.8701 


2.8200 
2.7795 
2.7459 
2.7178 
2.6937 


2.6730 
2.6550 
2.6391 
2.6251 
2.6126 


2.6013 
2.5912 
2.5820 
2.5736 
2.5660 


2.5589 
2.5525 
2.5465 
2.5409 
2.5357 


2.5145 
2.4989 
2.4868 
2.4772 
2.4694 


2.4630 
2.4529 
2.4454 
2.4395 
2.4349 


2.4311 
2.4280 
2.4102 
2.4021 
2.3980 
2.3940 


4 
1.2500 


8.8602 
5.3919 
4.3147 
3.8100 


3.5212 
3.3353 
3.2060 
3.1109 
3.0382 


2.9809 
2.9345 
2.8961 
2.8640 
2.8366 


2.8131 
2.7925 
2.7745 
2.7586 
2.7444 


2.7316 
2.7201 
2.7079 
2.7002 
2.6916 


2.6836 
2.6763 
2.6695 
2.6632 
2.6574 


2.6334 
2.6157 
2.6021 
2.5913 
2.5825 


2.5752 
2.5639 
2.5554 
2.5489 
2.5437 


2.5394 
2.5359 
2.5159 
2.5068 
2.5022 
2.4977 


QBon = 9.05 
Qind — 0.05/p 
Number of comparisons (p) 


5 
1.0000 


9.9248 
5.8409 
4.6041 
4.0321 


3.7074 
3.4995 
3.3554 
3.2498 
3.1693 


3.1058 
3.0545 
3.0123 
2.9768 
2.9467 


2.9208 
2.8982 
2.8784 
2.8609 
2.8453 


2.8314 
2.8188 
2.8073 
2.7969 
2.7874 


2.7787 
2.7707 
2.7633 
2.7564 
2.7500 


2.7238 
2.7045 
2.6896 
2.6778 
2.6682 


2.6603 
2.6479 
2.6387 
2.6316 
2.6259 


2.6213 
2.6174 
2.5956 
2.5857 
2.5808 
2.5758 


6 
0.8333 


10.8859 
6.2315 
4.8510 
4.2193 


3.8630 
3.6358 
3.4789 
3.3642 
3.2768 


3.2081 
3.1527 
3.1070 
3.0688 
3.0363 


3.0083 
2.9840 
2.9627 
2.9439 
2.9271 


2.9121 
2.8985 
2.8863 
2.8751 
2.8649 


2.8555 
2.8469 
2.8389 
2.8316 
2.8247 


2.7966 
2.7759 
2.7599 
2.7473 
2.7370 


2.7286 
2.7153 
2.7054 
2.6978 
2.6918 


2.6868 
2.6827 
2.6594 
2.6488 
2.6435 
2.6383 


7 
0.7143 


11.7687 
6.5797 
5.0675 
4.3818 


3.9971 
3.7527 
3.5844 
3.4616 
3.3682 


3.2949 
3.2357 
3.1871 
3.1464 
3.1118 


3.0821 
3.0563 
3.0336 
3.0136 
2.9958 


2.9799 
2.9655 
2.9525 
2.9406 
2.9298 


2.9199 
2.9107 
2.9023 
2.8945 
2.8872 


2.8575 
2.8355 
2.8187 
2.8053 
2.7944 


2.7855 
2.7715 
2.7610 
2.7530 
2.7466 


2.7414 
2.7370 
2.7124 
2.7012 
2.6957 
2.6901 


The Analysis of Variance 


8 
0.6250 


12.5897 
6.8952 
5.2611 
4.5257 


4.1152 
3.8552 
3.6766 
3.5465 
3.4477 


3.3702 
3.3078 
3.2565 
3.2135 
3.1771 


3.1458 
3.1186 
3.0948 
3.0738 
3.0550 


3.0382 
3.0231 
3.0095 
2.9970 
2.9856 


2.9752 
2.9656 
2.9567 
2.9485 
2.9409 


2.9097 
2.8867 
2.8690 
2.8550 
2.8436 


2.8342 
2.8195 
2.8086 
2.8002 
2.7935 


2.7880 
2.7835 
2.7577 
2.7460 
2.7402 
2.7344 


9 
0.5556 


13.3604 
7.1849 
5.4366 
4.6553 


4.2209 
3.9467 
3.7586 
3.6219 
3.5182 


3.4368 
3.3714 
3.3177 
3.2727 
3.2346 


3.2019 
3.1735 
3.1486 
3.1266 
3.1070 


3.0895 
3.0737 
3.0595 
3.0465 
3.0346 


3.0237 
3.0137 
3.0045 
3.9959 
3.9880 


2.9554 
2.9314 
2.9130 
2.8984 
2.8866 


2.8768 
2.8615 
2.8502 
2.8414 
2.8344 


2.8287 
2.8240 
2.7972 
2.7850 
2.7790 
2.7729 


10 
0.5000 


14.0890 
7.4533 
5.5976 
4.7733 


4.3168 
4.0293 
3.8325 
3.6897 
3.5814 


3.4966 
3.4284 
3.3725 
3.3257 
3.2860 


3.2520 
3.2224 
3.1966 
3.1737 
3.1534 


3.1352 
3.1188 
3.1040 
3.0905 
3.0782 


3.0669 
3.0565 
3.0469 
3.0380 
3.0298 


2.9960 
2.9712 
2.9521 
2.9370 
2.9247 


2.9146 
2.8987 
2.8870 
2.8779 
2.8707 


2.8648 
2.8599 
2.8322 
2.8195 
2.8133 
2.8070 


15 
0.3333 


17.2772 
8.5752 
6.2541 
5.2474 


4.6979 
4.3553 
4.1224 
3.9542 
3.8273 


3.7283 
3.6489 
3.5838 
3.5296 
3.4837 


3.4443 
3.4102 
3.3804 
3.3540 
3.3306 


3.3097 
3.2909 
3.2739 
3.2584 
3.2443 


3.2313 
3.2194 
3.2084 
3.1982 
3.1888 


3.1502 
3.1218 
3.1000 
3.0828 
3.0688 


3.0573 
3.0393 
3.0259 
3.0156 
3.0073 


3.0007 
2.9951 
2.9637 
2.9494 
2.9423 
2.9352 


20 
0.2500 


19.9625 
9.4649 
6.7583 
5.6042 


4.9807 
4.5946 
4.3335 
4.1458 
4.0045 


3.8945 
3.8065 
3.7345 
3.6746 
3.6239 


3.5805 
3.5429 
3.5101 
3.4812 
3.4554 


3.4325 
3.4118 
3.3931 
3.3761 
3.3606 


3.3464 
3.3334 
3.3214 
3.3102 
3.2999 


3.2577 
3.2266 
3.2028 
3.1840 
3.1688 


3.1562 
3.1366 
3.1220 
3.1108 
3.1018 


3.0945 
3.0885 
3.0543 
3.0387 
3.0310 
3.0233 
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Table XIII (continued) 


1 
vy 100(a/p) 1.0000 


2 9.9248 

3 5.8409 

4 4.6041 

5 4.0321 

6 3.7074 

7 3.4995 

8 3.3554 

9 3.2498 
10 3.1693 
11 3.1058 
12 3.0545 
13 3.0123 
14 2.9768 
15 2.9467 
16 2.9208 
17 2.8982 
18 2.8784 
19 2.8609 
20 2.8453 
21 2.8314 
22 2.8188 
23 2.8073 
24 2.7969 
25 2.7874 
26 2.7787 
27 2.7707 
28 2.7633 
29 2.7564 
30 2.7500 
35 2.7238 
40 2.7045 
45 2.6896 
50 2.6778 
55 2.6682 
60 2.6603 
70 2.6479 
80 2.6387 
90 2.6316 
100 2.6259 
110 2.6213 
120 2.6174 
250 2.5956 
500 2.5857 
1000 2.5808 
ore 2.5758 


2 
0.5000 


14.0890 
7.4533 
5.5976 
4.7733 


4.3168 
4.0293 
3.8325 
3.6897 
3.5814 


3.4966 
3.4284 
3.3725 
3.3257 
3.2860 
3.2520 


3.2224 
3.1966 
3.1737 
3.1534 


3.1352 
3.1188 
3.1040 
3.0905 
3.0782 


3.0669 
3.0565 
3.0469 
3.0380 
3.0298 


2.9960 
2.9712 
2.9521 
2.9370 
2.9247 
2.9146 


2.8987 
2.8870 
2.8779 
2.8707 
2.8648 


2.8599 
2.8322 
2.8195 
2.8133 
2.8070 


3 
0.3333 


17.2772 
8.5752 
6.2541 
5.2474 


4.6979 
4.3553 
4.1224 
3.9542 
3.8273 


3.7283 
3.6489 
3.5838 
3.5296 
3.4837 
3.4443 


3.4102 
3.3804 
3.3540 
3.3306 


3.3097 
3.2909 
3.2739 
3.2584 
3.2443 


3.2313 
3.2194 
3.2084 
3.1982 
3.1888 


3.1502 
3.1218 
3.1000 
3.0828 
3.0688 
3.0573 


3.0393 
3.0259 
3.0156 
3.0073 
3.0007 


2.9951 
2.9637 
2.9494 
2.9423 
2.9352 


4 
0.2500 


19.9625 
9.4649 
6.7583 
5.6042 


4.9807 
4.5946 
4.3335 
4.1458 
4.0045 


3.8945 
3.8065 
3.7345 
3.6746 
3.6239 
3.5805 


3.5429 
3.5101 
3.4812 
3.4554 


3.4325 
3.4118 
3.3931 
3.3761 
3.3606 


3.3464 
3.3334 
3.3214 
3.3102 
3.2999 


3.2577 
3.2266 
3.2028 
3.1840 
3.1688 
3.1562 


3.1366 
3.1220 
3.1108 
3.1018 
3.0945 


3.0885 
3.0543 
3.0387 
3.0310 
3.0233 


OBon — 0.01 


Qing = 0.01/p 
Number of comparisons (p) 


5 
0.2000 


22.3271 
10.2145 
7.1732 
5.8934 


5.2076 
4.7853 
4.5008 
4.2968 
4.1437 


4.0247 
3.9296 
3.8520 
3.7874 
3.7328 
3.6862 


3.6458 
3.6105 
3.5794 
3.5518 


3.5272 
3.5050 
3.4850 
3.4668 
3.4502 


3.4350 
3.4210 
3.4082 
3.3962 
3.3852 


3.3400 
3.3069 
3.2815 
3.2614 
3.2451 
3.2317 


3.2108 
3.1953 
3.1833 
3.1737 
3.1660 


3.1595 
39,1232 
3.1066 
3.0984 
3.0902 


6 
0.1667 


24.4643 
10.8668 
7.5287 
6.1384 


5.3982 
4.9445 
4.6398 
4.4219 
4.2586 


4.1319 
4.0308 
3.9484 
3.8798 
3.8220 
3.7725 


3.7297 
3.6924 
3.6595 
3.6303 


3.6043 
3.5808 
3.5597 
3.5405 
3.5230 


3.5069 
3.4922 
3.4786 
3.4660 
3.4544 


3.4068 
3.3718 
3.3451 
3.3239 
3.3068 
3.2927 


3.2707 
3.2543 
3.2417 
3.2317 
3.2235 


3.2168 
3.1785 
3.1612 
3.1526 
3.1440 


7 
0.1429 


26.4292 
11.4532 
7.8414 
6.3518 


5.5632 
5.0815 
4.7590 
4.5288 
4.3567 


4.2232 
4.1169 
4.0302 
3.9582 
3.8975 
3.8456 


3.8007 
3.7616 
3.7271 
3.6966 


3.6693 
3.6448 
3.6226 
3.6025 
3.5842 


3.5674 
3.5520 
3.5378 
3.5247 
3.5125 


3.4628 
3.4263 
3.3984 
3.3763 
3.3585 
3.3437 


3.3208 
3.3037 
3.2906 
3.2802 
3.2717 


3.2646 
3.2248 
3.2067 
3.1977 
3.1888 


8 
0.1250 


28.2577 
12.4715 
8.1216 
6.5414 


5.7090 
5.2022 
4.8636 
4.6224 
4.4423 


4.3028 
4.1918 
4.1013 
4.0263 
3.9630 
3.9089 


3.8623 
3.8215 
3.7857 
3.7539 


3.7255 
3.7000 
3.6770 
3.6561 
3.6371 


3.6197 
3.6037 
3.5889 
3.5753 
3.5626 


3.5110 
3.4732 
3.4442 
3.4214 
3.4029 
3.3876 


3.3638 
3.3462 
3.3326 
3.3218 
3.3130 


3.3057 
3.2644 
3.2457 
3.2365 
3.2272 


9 
0.1111 


29.9750 
11.9838 
8.3763 
6.7126 


5.8399 
5.3101 
4.9570 
4.7058 
4.5184 


4.3735 
4.2582 
4.1643 
4.0865 
4.0209 
3.9649 


3.9165 
3.8744 
3.8373 
3.8044 


3.7750 
3.7487 
3.7249 
3.7033 
3.6836 


3.6656 
3.6491 
3.6338 
3.6198 
3.6067 


3.5534 
3.5143 
3.4845 
3.4609 
3.4418 
3.4260 


3.4015 
3.3833 
3.3693 
3.3582 
3.349] 


3.3416 
3.299] 
3.2798 
3.2703 
3.2608 


10 
0.1000 


31.5991 
12.9240 
8.6103 
6.8688 


5.9588 
5.4079 
5.0413 
4.7809 
4.5869 


4.4370 
4.3178 
4.2208 
4.1405 
4.0728 
4.0150 


3.9651 
3.9216 
3.8834 
3.8495 


3.8193 
3.7921 
3.7676 
3.7454 
3.7251 


3.7066 
3.6896 
3.6739 
3.6594 
3.6460 


3.5911 
3.5510 
3.5203 
3.4960 
3.4764 
3.4602 


3.4350 
3.4163 
3.4019 
3.3905 
3.3812 


3.3735 
3.3299 
3.3101 
3.3003 
3.2905 
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0.0667 0.0500 
38.7105 44.7046 
14.8194 16.3263 
9.5679 10.3063 
7.4990 7.9757 
6.4338 6.7883 
5.7954 6.0818 
5.3737 5.6174 
5.0757 5.2907 
4.8547 5.0490 
4.6845 4.8633 
4.5496 4.7165 
4.4401 4.5975 
4.3495 4.4992 
4.2733 4.4166 
4.2084 4.3463 
4.1525 4.2858 
4.1037 4.2332 
4.0609 4.1869 
4.0230 4.1460 
3.9892 4.1096 
3.9589 4.0769 
3.9316 4.0474 
3.9068 4.0207 
3.8842 3.9964 
3.8635 3.9742 
3.8446 3.9538 
3.8271 3.9351 
3.8110 3.9177 
3.7961 3.9016 
3.7352 3.8362 
3.6906 3.7884 
3.6565 3.7519 
3.6297 3.7231 
3.6080 3.6999 
3.5901 3.6807 
3.5622 3.6509 
3.5416 3.6288 
3.5257 3.6118 
3.5131 3.5983 
3.5028 3.5874 
3.4943 3.5783 
3.4462 3.5270 
3.4245 3.5037 
3.4137 3.4922 
3.4029 3.4808 


From B. J. R. Bailey, “Tables of the Bonferroni ¢ Statistic,’ Journal of the American Stati- 
stical Association, 72 (1977), 469-478. Abridged and reprinted by permission. 
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Table XIV. Critical Values of the Dunn-Sidak’s Multiple 
Comparison Test 


This table gives the critical values of the Dunn-Sidak’s multiple comparison pro- 
cedure. The critical values are given fora = 0.01, 0.05, 0.10, 0.20; the number 
of comparisons p =2 (1) 10 (5) 40 (10) 50; and the error degrees of freedom 
v = 2 (1) 30, 40, 60, 120, oo. For example, for a =0.05, p =3, and v = 12, the 
required critical value is obtained as 2.770. 
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Table XV. Critical Values of the Studentized Maximum Modulus 
Distribution 


This table gives the critical values of the Studentized maximum modulus 
distribution used in multiple comparisons. The critical values are given for 
a =0.10, 0.05, 0.01; the number of comparisons p=2 (1) 5; and the error 
degrees of freedom v =2 (1) 12 (2) 20, 24, 30, 40, 60, oo. For example, for 
a =0.05, p =3, and v = 12, the required critical value is obtained as 2.75. 


Number of comparisons (p) 
va 2 3 4 5 6 7 8 9 10 11 12 13 14 «#15 


2 0.10 383 4.38 4.77 506 5.30 550 567 582 596 608 618 628 6.37 6.45 
0.05 5.57 634 689 7.31 765 7.93 8.17 838 857 874 889 9.03 9.16 9.28 
0.01 12.73 14.44. 15.65 16.59 17.35 17.99 18.53 19.01 19.43 19.81 20.15 20.46 20.75 21.02 


3 0.10 299 3.37 3.64 384 401 4.15 427 438 447 455 463 4.70 4.76 4.82 
0.05 3.96 443 476 502 5.23 541 556 569 5.81 5.92 601 610 6.18 6.26 
0.01 7.13 7.91 848 892 9.28 958 9.84 10.06 10.27 1045 10.61 10.76 10.90 11.03 


4 0.10 266 2.98 3.20 3.37 3.51 3.662 3.72 3.81 3.89 3.96 402 4.08 4.13 4.18 
0.05 3.38 3.74 400 420 4.37 450 462 472 482 490 498 5.04 5.11 5.17 
0.01 546 5.99 636 666 690 7.10 7.27 743 757 769 780 791 800 8.09 


5 0.10 2.49 2.77 2.96 3.12 3.24 334 343 3.51 358 3.64 3.69 3.75 3.79 3.84 
0.05 3.09 3.40 3.62 3.799 393 404 414 423 431 438 445 451 456 4.461 
0.01 4.70 5.11 540 563 581 597 611 623 633 643 652 660 667 6.74 


6 0.10 239 2.64 282 2.96 3.07 3.17 3.25 3.32 3.38 344 349 3.54 3.58 3.62 
0.05 292 3.19 3.39 3.54 366 3.77 386 3.94 401 4.07 4.13 418 4.23 4.28 
0.01 4.27 461 486 505 520 533 545 555 5.64 5.72 5.80 5.86 5.93 5.99 


7 0.10 2.31 2.56 2.73 286 2.96 3.05 3.13 3.19 3.25 3.31 3.35 340 3.44 3.48 
0.05 280 3.06 3.24 3.38 349 359 3.67 3.74 380 3.86 3.92 3.96 4.01 4.05 
0.01 400 430 451 468 481 493 503 5.12 5.20 5.27) 5.33 5.39 5.45 5.50 


8 0.10 226 249 266 2.78 288 297 304 3.110 3.16 3.21 3.26 3.30 3.34 3.37 
0.05 2.72 2.96 3.13 3.26 3.36 345 353 3.60 3.66 3.71 3.76 3.81 3.85 3.89 
0.01 3.81 4.08 427 442 455 465 4.74 482 489 496 502 5.07 5.12 5.17 


9 0.10 2.22 245 260 2.72 2.82 2.90 297 3.03 3.09 3.13 3.18 3.22 3.26 3.29 
0.05 266 2.89 3.05 3.17 3.27 3.36 343 349 3.55 3.60 3.65 3.69 3.73 3.77 
0.01 3.67 3.92 4.10 4.24 435 445 453 461 467 4.73 479 484 488 4.92 


10 0.10 2.19 241 256 268 2.77 285 292 298 303 3.08 3.12 3.16 3.20 3.23 
0.05 261 283 298 3.10 3.20 3.28 3.35 3.41 347 3.52 3.56 3.60 3.64 3.68 
0.01 3.57 3.80 3.97 4.10 4.20 4.29 437 444 450 456 461 4.66 4.70 4.74 


11 0.10 2.17 2.38 253 2.64 2.73 281 2.88 2.93 2.98 3.03 3.07 3.11 3.15 3.18 
0.05 2.57 2.78 293 3.05 3.14 3.22 3.29 3.35 340 3.45 349 3.53 3.57 3.60 
0.01 3.48 3.71 3.87 3.99 409 4.17 4.25 431 437 442 447 451 455 4.59 


12 0.10 2.15 2.36 2.50 2.61 2.70 2.78 284 2.90 295 2.99 3.03 3.07 3.10 3.14 
0.05 2.54 2.75 2.89 3.00 3.09 3.17 3.24 3.29 3.34 3.39 3.43 3.47 3.51 3.54 
0.01 3.42 3.63 3.78 3.90 400 408 4.15 421 426 431 436 440 444 4.48 


14 0.10 212 2.32 246 257 265 2.72 2.799 284 289 2.93 2.97 3.01 3.04 3.07 
0.05 249 269 283 2.94 3.02 309 3.16 3.21 326 3.30 3.34 3.38 3.41 3.45 
0.01 3.32 352 3.66 3.77 3.85 393 399 405 4.10 4.15 4.19 4.23 4.26 4.30 


16 0.10 210 2.29 243 253 262 269 2.75 280 285 2.89 2.93 2.96 2.99 3.02 
0.05 2.46 2.65 2.78 289 2.97 304 3.10 3.15 3.20 3.24 3.28 3.31 3.35 3.38 
0.01 3.25 3.43 3.57 3.67 3.75 382 388 394 3.99 403 407 4.11 4.14 4.17 
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Table XV (continued) 


Number of comparisons (p) 
Vv a 2 3 4 5 6 7 8 9 10 11 12 13 14 15 


18 O10 208 227 241 251 259 266 2.72 2.77 281 285 289 2.92 2.96 2.99 
0.05 243 262 2.75 2.85 293 3.00 3.05 3.11 3.15 3.19 3.23 3.26 3.29 3.32 
0.01 3.19 3.37 350 360 368 3.74 380 385 390 3.94 3.98 401 4.04 4.07 


20 0.10 207 226 2.39 249 257 263 269 274 279 283 286 2.90 2.93 2.96 
0.05 241 259 2.72 282 290 296 3.02 307 3.11 3.15 3.19 3.22 3.25 3.28 
0.01 3.15 3.32 3.45 354 3.62 3.68 3.74 3.79 3.83 3.87 3.91 3.94 3.97 4.00 


24 0.10 205 2.23 236 246 253 260 266 270 2.75 279 282 285 288 2.91 
0.05 2.38 256 2.68 2.77 285 291 297 3.02 3.06 3.10 3.13 3.16 3.19 3.22 
0.01 3.09 3.25 3.37 346 353 3.59 3.64 3.69 3.73 3.77 3.80 3.83 3.86 3.89 


30 «60.10 «203 2.21 233 243 250 257 262 267 2.71 275 2.78 281 2.84 2.87 


0.05 235 252 264 2.73 280 287 292 296 3.00 3.04 307 3.11 3.13 3.16 
0.01 3.03 3.18 3.29 3.38 345 3.51 355 3.60 3.64 3.67 3.70 3.73 3.76 3.78 


40 0.10 201 2.18 230 240 247 253 258 2.63 267 2.71 2.74 2.77 2.80 2.82 
0.005 2.32 249 2.60 269 276 282 287 291 2.95 2.99 3.02 3.05 3.08 3.10 
0.01 2.97 3.112 3.22 3.30 3.37 3.42 3.47 351 3.54 3.58 3.61 3.63 3.66 3.68 


60 0.10 199 2.116 228 237 244 250 255 259 263 267 2.70 2.73 2.76 2.78 
0.05 2.29 245 256 2.65 272 2.77) 282 286 2.90 2.93 296 2.99 3.02 3.04 
0.01 291 3.05 3.15 3.23 3.29 3.34 3.38 342 346 3.49 351 354 3.56 3.59 


coo )«€©60.10 1.95 © 2.11 223 231 238 243 248 252 256 259 262 265 2.67 2.70 
0.05 2.24 239 249 257 263 268 2.73 2.77 280 283 286 288 2.91 2.93 
0.01 281 293 3.02 309 3.14 3.119 3.23 3.26 3.29 3.32 3.34 3.36 3.38 3.40 


From R. E. Bechhofer and C. W. Dunnett, “Comparisons for Orthogonal Con- 
trasts: Examples and Tables,” Technometrics, 24 (1982), 213-222. Abridged 


and reprinted by permission. 
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Table XVI. Critical Values of the Studentized Augmented 
Range Distribution 


This table gives the critical values of the Studentized augmented range dis- 
tribution used in multiple comparisons. The critical values are given for a = 
0.20, 0.10, 0.05, 0.01; the number of comparisons p = 2 (1) 8; and the error 
degrees of freedom v = 5, 7, 10, 12 (4) 24, 30, 40, 60, 120, oo. For example, 
for a = 0.05, p =4, and v = 16, the desired critical value is obtained as 4.050. 


Number of comparisons (p) 


Vv a 2 3 4 5 6 7 8 
5 .20 2.326 2.935 3.379 3.719 3.99] 4.215 4.406 
10 3.060 3.772 4.282 4.671 4.982 5.239 5.458 
05 3.832 4.654 5.236 5.680 6.036 6.331 6.583 
01 5.903 7.030 7.823 8.429 8.916 9.322 9.669 
7 .20 2.213 2.783 3.195 3.508 3.757 3.963 4.137 
.10 2.848 3.491 3.943 4.285 4.556 4.781 4.972 
.05 3.486 4.198 4.692 5.064 5.360 5.606 5.816 
01 5.063 5.947 6.551 7.008 7.374 7.679 7.939 
10 .20 2.133 2.676 3.066 3.359 3.592 3.783 3.944 
.10 2.704 3.300 3.712 4.021 4.265 4.466 4.636 
.05 3,259 3.899 4.333 4.656 4.913 5.124 5.305 
01 4.550 5.284 5.773 6.138 6.428 6.669 6.875 
12 .20 2.103 2.636 3.017 3.303 3.530 3.715 3.872 
.10 2.651 3.230 3.628 3.924 4.157 4,349 4.511 
05 3.177 3.791 4.204 4.509 4.751 4.950 5.119 
01 4.373 5.056 5.505 5.837 6.101 6.321 6.507 
16 .20 2.066 2.587 2.958 3.235 3.453 3.632 3.782 
.10 2.588 3.146 3.526 3.806 4.027 4.207 4.360 
.05 3.080 3.663 4.050 4.334 4.557 4.741 4.897 
01 4.169 4.792 5.194 5.489 5.722 5.915 6.079 
20 .20 2.045 2.558 2.923 3.195 3.408 3.582 3.729 
.10 2.551 3.097 3.466 3.738 3.950 4.124 4.271 
05 3.024 3.590 3.961 4.233 4.446 4.620 4.768 
01 4.055 4.644 5.019 5.294 5.510 5.688 5.839 
24 .20 2.031 2.539 2.900 3.168 3.378 3.549 3.694 
.10 2.527 3.065 3.427 3.693 3.901 4.070 4.213 
.05 2.988 3.542 3.904 4.167 4.373 4.541 4.684 
01 3.982 4.549 4.908 5.169 5.374 5.542 5.685 
30 .20 2.017 2.521 2.877 3.142 3.348 3.517 3.659 
.10 2.503 3.034 3.389 3.649 3.851 4.016 4.155 
05 2.952 3.496 3.847 4.103 4.320 4.464 4.602 
01 3.912 4.458 4.800 5.048 5.242 5.401 5.536 
40 .20 2.003 2.502 2.855 3.116 3.319 3.485 3.624 
.10 2.480 3.003 3.352 3.605 3.803 3.963 4.099 
05 2.918 3.450 3.792 4.040 4.232 4.389 4.521 
O01 3.844 4.370 4.696 4.931 5.115 5.265 5.392 
60 .20 1.990 2.484 2.833 3.090 3.290 3.453 3.589 
.10 2.457 2.927 3.315 3.563 3.755 3.911 4.042 
.05 2.884 3.406 3.738 3.978 4.163 4.314 4.441 
01 3.778 4.284 4.595 4.818 4.991 5.133 5.253 
120 .20 1.976 2.466 2.811 3.064 3.261 3.421 3.554 
.10 2.434 2.943 3.278 3.520 3.707 3.859 3.987 
05 2.851 3.362 3.686 3.917 4.096 4.241 4.363 
01 3.714 4.201 4.497 4.709 4.872 5.005 5.118 
oe) .20 1.963 2.448 2.789 3.039 3,232 3.389 3.520 
.10 2.412 2.913 3.243 3.479 3.661 3.808 3.931 
05 2.819 3.320 3.634 3.858 4.030 4.170 4.286 
01 3.653 4.121 4.403 4.603 4.757 4.882 4.987 


From M. R. Stoline, “Tables of the Studentized Augmented Range and Applica- 
tions to Problems of Multiple Comparisons,” Journal of the American Statistical 
Association, 73 (1978), 656-660. Adapted and reprinted by permission. 
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Table XVII (a). Critical Values of the Distribution of 4; 
for Testing Skewness 


This table gives the upper-tailed critical values of the sample estimate of the 
coefficient of skewness (7, ). The critical values are given fora = 0.05, 0.01, and 
the sample size n = 25 (5) 50 (10) 100 (25) 200 (5) 500. Since the distribution 
of the statistic / is symmetrical about zero, the one-tailed critical values also 
represent two-tailed values of 0.10 and 0.02. For example, for w =0.05 and 
n = 30, the desired critical value is obtained as 0.661. 


Critical Value (a) Critical Value (a) 

Standard Standard 

n 0.05 0.01 Deviation n 0.05 0.01 Deviation 
25 0.711 1.061 0.4354 100 0.389 0.567 0.2377 
30 0.661 0.982 0.4052 125 0.350 0.508 0.2139 
35 0.621 0.921 0.3804 150 0.321 0.464 0.1961 
40 0.587 0.869 0.3596 175 0.298 0.430 0.1820 
45 0.558 0.825 0.3418 200 0.280 0.403 0.1706 
50 0.533 0.787 0.3264 250 0.251 0.360 0.1531 
60 0.492 0.723 0.3009 300 0.230 0.329 0.1400 
70 0.459 0.673 0.2806 350 0.213 0.305 0.1298 
80 0.432 0.631 0.2638 400 0.200 0.285 0.1216 
90 0.409 0.596 0.2498 450 0.188 0.269 0.1147 
100 0.389 0.567 0.2377 500 0.179 0.255 0.1089 


Table XVII (b). Critical Values of the Distribution of 42 
for Testing Kurtosis 


This table gives upper- and lower-tailed critical values of the sample estimate of 
the coefficient of kurtosis (72). The critical values are given for a = 0.05, 0.01, 
and the sample size n = 50 (25) 150 (50) 1000 (200) 2000. For example, for 
a = 0.05 and n = 50, the upper-tailed critical value is obtained as 3.99. 


Critical Value (a) Critical Value (a) 
Upper Lower Upper Lower 
n 0.01 0.05 0.05 0.01 n 0.01 0.05 0.05 0.01 
50 4.88 3.99 2.15 1.95 600 3.54 3.34 2.70 2.60 
75 4.59 3.87 2.27 2.08 650 3.52 3.33 2.71 2.61 
100 4.39 3.77 2.55 2.18 700 3.50 3.31 212 2.62 
125 4.24 3.71 2.40 2.24 750 3.48 3.30 2.73 2.64 
150 4.13 3.65 2.45 2.29 800 3.46 3.29 2.74 2.65 
850 3.45 3.28 2.74 2.66 
200 3.98 3.57 2.51 2.37 900 3.43 3.28 2.75 2.66 
250 3.87 3.52 2.59 2.42 950 3.42 3.27 2.76 2.67 
300 3.79 3.47 2.59 2.46 1000 3.41 3.26 2.76 2.68 
350 3.72 3.44 2.62 2.50 
400 3.67 3.41 2.64 2.52 1200 3.51 3.24 2.78 2.71 
450 3.63 3.39 2.66 2.55 1400 3.34 g.22 2.80 2.72 
500 3.60 3.37 2.67 2.57 1600 3.32 3.21 2.81 2.74 
550 3.57 3.35 2.69 2.58 1800 3.30 3.20 2.82 2.76 
600 3.54 3.34 2.70 2.60 2000 3.28 3.18 2.83 2.77 


From E. S. Pearson and H. O. Hartley, Biometrika Tables for Statisticians, Vol. I, 
Third Edition, © 1970 by Cambridge University Press, Cambridge. Reprinted 
by permission (from Table 34B and 34C). 
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Table XVIII. Coefficients of Order Statistics for the 
Shapiro-Wilk’s W Test for Normality 


The table gives coefficients {a,_;4;} (i = 1,2, ...,) of the order statistics for 
determining the Shapiro- Wilk’s W statistic. The coefficients are given for n = 2 
(1) 30. Shapiro and Wilk (1965) used approximations for n > 20. The values 
given here are exact upto n = 30. 


] 0.7071 0.7071 
2 - 0.0000 
3 = _ 
4 - = 
5 


] 0.5601 0.5475 
2 0.3315 0.3325 
3 0.2260 0.2347 
4 0.1429 0.1586 
5 0.0695 0.0922 
6 0.0000 0.0303 
q za = 
8 = a 
9 = _ 


oO 
| 
| 


x 
NO 
mar 
No 
No 


1 0.4664 0.4598 
2 0.3189 0.3167 
3 0.2567 0.2566 
4 0.2106 0.2122 
5 0.1724 0.1756 
6 0.1388 0.1435 
7 0.1083 0.1144 
8 0.0798 0.0873 
9 0.0525 0.0615 
10 0.0261 0.0366 
11 0.0000 0.0121 
12 = zs 
13 x 2 
14 = = 
15 Z és 


10 


0.5739 
0.3290 
0.2141 
0.1224 
0.0399 


19 


0.4808 
0.3232 
0.2561 
0.2059 
0.1641 


0.1271 
0.0932 
0.0612 
0.0303 
0.0000 


29 


0.4214 
0.3014 
0.2525 
0.2168 
0.1878 


0.1628 
0.1404 
0.1200 
0.1008 
0.0827 


0.0654 
0.0486 
0.0322 
0.0160 
0.0000 


20 


0.4734 
0.3211 
0.2565 
0.2085 
0.1686 


0.1334 
0.1013 
0.0712 
0.0422 
0.0140 


30 


0.4168 
0.2993 
0.2516 
0.2169 
0.1886 


0.1643 
0.1427 
0.1228 
0.1044 
0.0869 


0.0702 
0.0541 
0.0383 
0.0229 
0.0076 


From T. J. Lorenzen and V. L. Anderson, Design of Experiments: A No-Name 
Approach, © 1993 by Marcel Dekker, New York. Reprinted by permission. 
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Table XIX. Critical Values of the Shapiro-Wilk’s W Test 
for Normality 


This table gives critical values of the Shapiro-Wilk’s W test for normality. The 
critical values are given for ~w = 0.01, 0.02, 0.05, 0.10, 0.50, and n = 3 (1) 50. 


Critical Value (a) 
n 0.01 0.02 0.05 0.10 0.50 


3 0.753 0.756 0.767 0.789 0.959 
4 0.687 0.707 0.748 0.792 0.935 
5 0.686 0.715 0.762 0.806 0.927 
6 0.713 0.743 0.788 0.826 0.927 
7 0.730 0.760 0.803 0.838 0.928 
8 0.749 0.778 0.818 0.851 0.932 
9 0.764 0.791 0.829 0.859 0.935 
10 0.781 0.806 0.842 0.869 0.938 
11 0.792 0.817 0.850 0.876 0.940 
12 0.805 0.828 0.859 0.883 0.943 
13 0.814 0.837 0.866 0.889 0.945 
14 0.825 0.846 0.874 0.895 0.947 
15 0.835 0.855 0.881 0.901 0.950 
16 0.844 0.863 0.887 0.906 0.952 
17 0.851 0.869 0.892 0.910 0.954 
18 0.858 0.874 0.897 0.914 0.956 
19 0.863 0.879 0.901 0.917 0.957 
20 0.868 0.884 0.905 0.920 0.959 
21 0.873 0.888 0.908 0.923 0.960 
22 0.878 0.892 0.911 0.926 0.961 
23 0.881 0.895 0.914 0.928 0.962 
24 0.884 0.898 0.916 0.930 0.963 
25 0.888 0.901 0.918 0.931 0.964 
26 0.891 0.904 0.920 0.933 0.965 
27 0.894 0.906 0.923 0.935 0.965 
28 0.896 0.908 0.924 0.936 0.966 | 
29 0.898 0.910 0.926 0.937 0.966 
30 0.900 0.912 0.927 0.939 0.967 
31 0.902 0.914 0.929 0.940 0.967 
32 0.904 0.915 0.930 0.941 0.968 
33 0.906 0.917 0.931 0.942 0.968 
34 0.908 0.919 0.933 0.943 0.969 
35 0.910 0.920 0.934 0.944 0.969 
36 0.912 0.922 0.935 0.945 0.970 
37 0.914 0.924 0.936 0.946 0.970 
38 0.916 0.925 0.938 0.947 0.971 
39 0.917 0.927 0.939 0.948 0.971 
40 0.919 0.928 0.940 0.949 0.972 
4] 0.920 0.929 0.941 0.950 0.972 
42 0.922 0.930 0.942 0.951 0.972 
43 0.923 0.932 0.943 0.951 0.973 
44 0.924 0.933 0.944 0.952 0.973 
45 0.926 0.934 0.945 0.953 0.973 
46 0.927 0.935 0.945 0.953 0.974 
47 0.928 0.928 0.946 0.954 0.974 
48 0.929 0.937 0.947 0.954 0.974 
49 0.929 0.937 0.947 0.955 0.974 
50 0.930 0.938 0.947 0.955 0.974 


From S. S. Shapiro and M. B. Wilk, “An Analysis of Variance Test for Nor- 
mality (Complete Samples),” Biometrika, 52 (1965), 591-611. Reprinted by 
permission. 
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Table XX. Critical Values of the D’Agostino’s D Test 
for Normality 


This table gives critical values of the D’Agostino’s D test for normality. The 
critical values are given for ~w = 0.20, 0.10, 0.05, 0.02, 0.01; and n = 10 (2) 
50 (10) 100 (20) 200 (50) 1000 (250) 2000. 


120 
140 
160 
180 
200 


250 
300 
350 
400 
450 


500 
600 
700 
800 
900 


1000 
1250 
1500 
1750 
2000 


0.20 


0.2632, 0.2835 
0.2653, 0.2841 
0.2669, 0.2846 
0.2681, 0.2848 
0.2690, 0.2850 
0.2699, 0.2852 


0.2705, 0.2853 
0.2711, 0.2853 
0.2717, 0.2854 
0.2721, 0.2854 
0.2725, 0.2854 


0.2729, 0.2854 
0.2732, 0.2854 
0.2735, 0.2854 
0.2738, 0.2854 
0.2740, 0.2854 


0.2743, 0.2854 
0.2745, 0.2854 
0.2747, 0.2854 
0.2749, 0.2854 
0.2751, 0.2853 


0.2757, 0.2852 
0.2763, 0.2851 
0.2768, 0.2850 
0.2771, 0.2849 
0.2774, 0.2849 


0.2779, 0.2847 
0.2782, 0.2846 
0.2785, 0.2845 
0.2787, 0.2844 
0.2789, 0.2843 


0.2793, 0.2841 
0.2796, 0.2840 
0.2798, 0.2839 
0.2799, 0.2838 
0.2801, 0.2837 


0.2802, 0.2836 
0.2804, 0.2835 
0.2805, 0.2834 
0.2806, 0.2833 
0.2807, 0.2833 


0.2808, 0.2832 
0.2809, 0.2831 
0.2810, 0.2830 
0.2811, 0.2830 
0.2812, 0.2829 


0.10 


0.2573, 0.2843 
0.2598, 0.2849 
0.2618, 0.2853 
0.2634, 0.2855 
0.2646, 0.2855 
0.2657, 0.2857 


0.2670, 0.2859 
0.2675, 0.2860 
0.2682, 0.2861 
0.2688, 0.2861 
0.2693, 0.2861 


0.2698, 0.2862 
0.2703, 0.2862 
0.2707, 0.2862 
0.2710, 0.2862 
0.2714, 0.2862 


0.2717, 0.2861 
0.2720, 0.2861 
0.2722, 0.2861 
0.2725, 0.2861 
0.2727, 0.2861 


0.2737, 0.2860 
0.2744, 0.2859 
0.2750, 0.2857 
0.2755, 0.2856 
0.2759, 0.2855 


0.2765, 0.2853 
0.2770, 0.2852 
0.2774, 0.2851 
0.2777, 0.2850 
0.2779, 0.2848 


0.2784, 0.2846 
0.2788, 0.2844 
0.2791, 0.2843 
0.2793, 0.2842 
0.2795, 0.2841 


0.2796, 0.2840 
0.2799, 0.2839 
0.2800, 0.2838 
0.2802, 0.2837 
0.2803, 0.2836 


0.2804, 0.2835 
0.2806, 0.2834 
0.2807, 0.2833 
0.2808, 0.2832 
0.2809, 0.2831 


Critical Value (a) 


0.05 


0.2513, 0.2849 
0.2544, 0.2854 
0.2568, 0.2858 
0.2587, 0.2860 
0.2603, 0.2862 
0.2617, 0.2863 


0.2629, 0.2864 
0.2638, 0.2865 
0.2647, 0.2866 
0.2655, 0.2866 
0.2662, 0.2866 


0.2668, 0.2867 
0.2674, 0.2867 
0.2679, 0.2867 
0.2683, 0.2867 
0.2688, 0.2867 


0.2691, 0.2867 
0.2695, 0.2867 
0.2698, 0.2866 
0.2702, 0.2866 
0.2705, 0.2866 


0.2717, 0.2865 
0.2726, 0.2864 
0.2734, 0.2863 
0.2740, 0.2862 
0.2745, 0.2860 


0.2752, 0.2858 
0.2758, 0.2856 
0.2763, 0.2855 
0.2767, 0.2854 
0.2770, 0.2853 


0.2776, 0.2850 
0.2781, 0.2848 
0.2784, 0.2847 
0.2787, 0.2845 
0.2789, 0.2844 


0.2791, 0.2843 
0.2794, 0.2842 
0.2796, 0.2840 
0.2798, 0.2839 
0.2799, 0.2838 


0.2800, 0.2838 
0.2803, 0.2836 
0.2805, 0.2835 
0.2806, 0.2834 
0.2807, 0.2833 


0.02 


0.2436, 0.2855 
0.2473, 0.2859 
0.2503, 0.2862 
0.2527, 0.2865 
0.2547, 0.2866 
0.2564, 0.2867 


0.2579, 0.2869 
0.2591, 0.2870 
0.2603, 0.2870 
0.2612, 0.2870 
0.2622, 0.2871 


0.2630, 0.2871 
0.2636, 0.2871 
0.2643, 0.2871 
0.2649, 0.2871 
0.2655, 0.2871 


0.2659, 0.2871 
0.2664, 0.2871 
0.2668, 0.2871 
0.2672, 0.2871 
0.2676, 0.2871 


0.2692, 0.2870 
0.2708, 0.2869 
0.2713, 0.2868 
0.2721, 0.2866 
0.2727, 0.2865 


0.2737, 0.2863 
0.2744, 0.2862 
0.2750, 0.2860 
0.2755, 0.2859 
0.2759, 0.2857 


0.2767, 0.2855 
0.2772, 0.2853 
0.2776, 0.2851 
0.2780, 0.2849 
0.2782, 0.2848 


0.2785, 0.2847 
0.2788, 0.2845 
0.2791, 0.2844 
0.2793, 0.2842 
0.2795, 0.2841 


0.2796, 0.2840 
0.2799, 0.2839 
0.2801, 0.2837 
0.2803, 0.2836 
0.2804, 0.2835 


0.01 


0.2379, 0.2857 
0.2420, 0.2862 
0.2455, 0.2865 
0.2482, 0.2867 
0.2505, 0.2868 
0.2525, 0.2869 


0.2542, 0.2870 
0.2557, 0.2871 
0.2570, 0.2872 
0.2581, 0.2873 
0.2592, 0.2872 


0.2600, 0.2873 
0.2609, 0.2873 
0.2617, 0.2873 
0.2623, 0.2873 
0.2630, 0.2874 


0.2636, 0.2874 
0.2641, 0.2874 
0.2646, 0.2874 
0.2651, 0.2874 
0.2655, 0.2874 


0.2673, 0.2873 
0.2687, 0.2872 
0.2698, 0.2871 
0.2707, 0.2870 
0.2714, 0.2869 


0.2725, 0.2866 
0.2734, 0.2865 
0.2741, 0.2863 
0.2746, 0.2862 
0.2751, 0.2860 


0.2760, 0.2858 
0.2766, 0.2855 
0.2771, 0.2853 
0.2775, 0.2852 
0.2778, 0.2851 


0.2780, 0.2849 
0.2784, 0.2847 
0.2787, 0.2846 
0.2790, 0.2844 
0.2792, 0.2843 


0.2793, 0.2842 
0.2797, 0.2840 
0.2799, 0.2839 
0.2801, 0.2838 
0.2802, 0.2837 


From R. B. D’ Agostino and M. A. Stephens, Goodness-of-Fit Techniques, © 


1986 by Marcel Dekker, Inc., New York. Reprinted by permission. 
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Table XXI. Critical Values of the Bartlett’s Test for Homogeneity 
of Variances 


This table gives critical values of the Bartlett’s test for homogeneity of variances 
having equal sample sizes in each group. Bartlett’s test statistic is the ratio of the 
weighted geometric mean of the sample variances to their weighted arithmetic 
mean (the weights are relative degrees of freedom). The critical values are given 
for a = 0.01, 0.05, 0.10; the number of groups p = 2 (1) 10; and the sample 
size in each group n = 3 (1) 30 (10) 60 (20) 100. We reject the hypothesis of 
homogeneity of variances at the a-level of significance if B < B,(n, a), where 
B is the calculated value of the Bartlett’s statistic and B,(n, @) is the critical 
value having an area of size @ in the left-tail of the Bartlett’s distribution. The 
critical values for equal sample sizes given in this table can also be used to 
obtain a highly accurate approximation of the critical values in the unequal 
sample size case by employing the following relation: 


B,(nj, n2,...,Np,a@) = (ny /N)B,(n1, &) + (n2/N)B)(n2, &) 
+-+-+(np/N)B,(np, &), 


where N = Se nj, Bp(n1,n2,...,Np;a@) denotes the a-level critical value 
of the Bartlett’s test statistic with p groups having nj, n2, ...,p observations, 
respectively, and B,(n;,@),i = 1,2,..., p, denotes the a-level critical value 
in the equal sample size case with n; observations in all p groups. For a given 
p, where p = 2 (1) 10, and for any combination of sample sizes from 5 (1) 
100, the absolute error of this approximation is less than 0.005 (the percentage 
relative error is less than one-half of one percent) when a = 0.05, 0.10, or 
0.25. When w = 0.01, the absolute error is approximately 0.015 in the extreme 
case and less than 0.005 when min (nj, 2, ...,Np)) = 10. The approximation 
can be improved with the help of correction factors given in Dyer and Keating 
(1980, Table 2) and the absolute error of the corrected approximation 1s as small 
as for any other @ values. To illustrate, suppose p = 4 and n, = 5, no = 6, 
n3 = 10,n4 = 50. Using the relation given previoulsy, B4(5, 6, 10, 50;0.01) = 
(5/71) (0.4607) + (6/71) (0.5430) + (10/71) (0.7195) + (50/71) (0.9433) = 
0.8440. Using the correction factors given in Dyer and Keating (1980, 
Table 2) it can be shown that B,(5, 6, 10, 50;0.01) = 0.8364. The exact value 
is B4(5, 6, 10, 50; 0.01) = 0.8359. 
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MN 


100 


Number of Groups (p) 
5 6 7 
a = 0.01 
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Table XXI (continued) 


GN 


100 


Number of Groups (p) 


5 


3299 
4921 
5952 


.6646 
1142 
71512 
1798 
8025 


8210 
8364 
8493 
8604 
8699 


8782 
8856 
8921 
8979 
9031 


.9078 
9120 
9159 
9195 
9228 


9258 
9286 
9312 
.9336 
9358 


9520 
.9617 
.968 1 
.9761 
.9809 


6 


7 
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Table XXI (continued ) 


Number of Groups (p) 
n 2 3 4 5 6 7 8 9 10 
a= 0.10 


3.4359 399] 3966 4006 4061 4116 — — — 
4 5928 55983 551 3582 5626 5673 717 759 57197 
5 .6842 6539 6507 6530 6566 6605 .6642 .6676 6708 


6 .7429 .7163 .7133 7151 7182 1214 £7245 1274 7301 
7.7834 .7600 STZ 1587 7612 .1640 .1667 .7692 1716 
8 .8130 1921 7895 .1908 .7930 7955 1978 .8000 8021 
9 .8356 8168 8143 8154 8174 8196 8217 8236 8254 
10.8533 8362 8339 8349 8367 8386 8405 8423 8439 


11 .8676 8519 8498 8507 8523 8540 8557 8574 8589 
12 .8794 8649 8629 8637 8652 8668 8683 8698 8712 
13.8892 8758 .8740 .8746 .8760 8775 8789 8803 8816 
14 8976 8851 8833 .8840 8852 .8866 .8879 8892 8904 
15 .9048 8931 8914 8920 8932 8944 8957 .8969 8980 


16 .9110 .9000 8985 .8990 9001 9013 9025 .9036 .9046 
17.9165 .9061 .9046 9051 .9062 .9073 .9084 .9094 .9104 
18 .9214 9115 9101 9106 9115 .9126 9137 .9146 9156 
19 .9257 9163 9150 9154 9163 .9174 9183 .9193 9201 
20 = .9295 .9206 9194 .9198 9207 9216 9226 9234 9243 


21 .9330 9245 .9233 9237 9245 9255 .9263 9272 9280 
22 = .9362 9281 .9269 9273 9281 9289 .9298 .9306 9313 
23.9390 9313 .9302 9305 9313 9321 .9329 .9337 .9344 
24 .9417 .9342 9332 9335 9342 9350 9358 .9365 9372 
25 .9441 .9369 9359 .9362 .9369 9377 .9384 .9391 .9398 


26 .9463 .9394 .9384 9387 .9394 9401 .9408 9415 9421 
27.9484 9417 .9408 .9410 9417 9424 943] 9437 9443 
28 .9503 9439 9429 9432 .9438 9445 9452 9458 .9464 
29 9520 9458 .9449 9452 9458 .9464 947] 9477 9483 
30 = .9537 9477 .9468 9471 .9476 9483 9489 9495 9500 


40 .9655 .9610 .9603 .9605 .9609 .9614 .9619 .9623 .9627 
50.9725 .9689 .9683 .9685 .9688 .9692 .9696 .9699 .9703 
60 .9771 9741 9737 .9738 .9741 .9744 9747 .9750 9753 
80 = .9829 .9806 .9803 .9804 .9806 .9808 9811 .9813 9815 
100 .9864 9845 .9843 .9843 9845 .9847 .9849 9851 9852 


From D. D. Dyer and J. P. Keating, “On the Determination of Critical Values 
for Bartlett’s Test,’ Journal of the American Statistical Association, 75 (1980), 
313-319. Abridged and reprinted by permission. 
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Table XXII. Critical Values of the Hartley’s Maximum F Ratio Test 
for Homogeneity of Variances 


This table gives critical values of the Hartley’s maximum F ratio test for ho- 
mogeneity of variances having equal sample sizes in each group. The critical 
values are given for a = 0.05, 0.01; the number groups p =2 (1) 12; and the 
number of degrees of freedom for variance estimate v = 2 (1) 10, 12, 15, 20, 
30, 60, oo. 


Number of Groups (p) 
vy a 2 3 4 5 6 7 8 9 10 11 12 


2 .05 39.00 87.50 142.00 202.00 266.00 333.00 403.00 475.00 550.00 626.00 704.00 
01 199.00 448.00 729.00 1036.00 1362.00 1705.00 2063.00 2432.00 2813.00 3204.00 3605.00 


3.05 15.40 27.80 39.20 50.70 62.00 72.90 83.50 93.90 104.00 114.00 124.00 
01 47.50 85.00 120.00 151.00 184.00 216.00 249.00 281.00 310.00 337.00 361.00 


4 .05 9.60 15.50 20.60 25.20 29.50 33.60 37.50 41.40 44.60 48.00 51.40 
01 23.20 37.00 49.00 59.00 69.00 79.00 89.00 97.00 106.00 113.00 120.00 


5 .05 7.15 10.80 13.70 16.30 18.70 20.80 22.90 24.70 26.50 28.20 29.90 
.O1 14.90 22.00 28.00 33.00 38.00 42.00 46.00 50.00 54.00 57.00 60.00 
6 .05 5.82 8.38 10.40 12.10 13.70 15.00 16.30 17.50 18.60 19.70 20.70 
.O1 11.10 15.50 19.10 22.00 25.00 27.00 30.00 32.00 34.00 36.00 37.0 
7.05 4.99 6.94 8.44 9.70 10.80 11.80 12.70 13.50 14.30 15.10 15.80 
01 8.89 12.10 14.50 16.50 18.40 20.00 22.00 23.00 24.00 26.00 27.00 
8 .05 4.43 6.00 7.48 8.12 9.03 9.78 10.50 11.10 11.70 12.20 12.70 
01 7.50 9.90 11.70 13.20 14.50 15.80 16.90 17.90 18.90 19.80 21.00 
9 .05 4.03 5.34 6.31 7.11 7.80 8.41 8.95 9.45 9.91 10.30 10.70 
01 6.54 8.50 9.90 11.10 12.10 13.10 13.90 14.70 15.30 16.00 16.60 
10.05 3.72 4.85 5.67 6.34 6.92 7.42 7.87 8.28 8.66 9.01 9.34 
01 5.85 7.40 8.60 9.60 10.40 11.10 11.80 12.40 12.90 13.40 13.90 
12 .05 3.28 4.16 4.79 5.30 5.72 6.09 6.42 6.72 7.00 7.25 7.48 
.O1 4.9) 6.10 6.90 7.60 8.20 8.70 9.10 9.50 9.90 10.20 10.60 
15.05 2.86 3.54 4.01 4.37 4.68 4.95 5.19 5.40 5.59 5.77 5.93 
Ol 4.07 4.90 5.50 6.00 6.40 6.70 7.10 7.30 7.50 7.80 8.00 
20.05 2.46 2.95 3.29 3.54 3.76 3.94 4.10 4.24 4,37 4.49 4.59 
01 3.32 3.80 4.30 4.60 4.90 5.10 5.30 5.50 5.60 = 5.80 5.90 
30.05 2.07 2.40 2.61 2.78 2.91 3.02 3.12 3.21 3.29 3.36 3.39 
01 2.63 3.00 3.30 3.40 3.60 3.70 3.80 3.90 4.00 4.10 4.20 
60 .05 1.67 1.85 1.96 2.04 2.11 2.17 Dine 2.26 2.30 2.33 2.36 
01 1.96 2.20 2.30 2.40 2.40 2.50 2.50 2.60 2.60 2.70 2.70 
oo «05 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 
01 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 


From H. A. David, “Upper 5 and 1% Points of the Maximum F-Ratio” 
Biometrika, 39 (1952), 422—424. Reprinted by permission. 
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Table XXIII. Critical Values of the Cochran’s C Test 
for Homogeneity of Variances 


This table gives critical values of Cochran’s C test for homogeneity of vari- 
ances having equal sample sizes in each group. The critical values are given for 
a = 0.05, 0.01; the number of groups p = 2 (1) 10, 12, 15, 20, 24, 30, 40, 60, 
120; and the number of degrees of freedom for variance estimate v = 1 (1) 10, 


16, 36, 144, oo. 


36 


Number of Groups (p) 


4748 . 
5153. 


4031 . 
4230 . 


.3333 . 
3333 . 


6 


.7808 . 
.8828 . 


6161 . 
7218 . 


5321 
6258 


.4803 
5635 


4447 
5195 


4184 
.4866 


.3980 
.4608 


3817 
4401 


3682 
4229 


3568 
4084 


3135 
3529 


.2612 
.2858 


.2119 
2229 


.1667 
.1667 


4800 
5685 


.4307 
5080 


3974 
4659 


3726 
4347 


3535 
4105 


3384 
3911 


3259 
3751 


3154 
3616 


.2756 
3105 


2278 
.2494 


1833 
.1929 


1429 
1429 


8 


.6798 
.7945 


5157 
.6152 


4377 
.5209 


3910 
4627 


3595 
4226 


.3362 
3932 


3185 
.3704 


3043 
3522 


.2926 
3373 


2829 
3248 


.2462 
.2779 


.2022 
2214 


.1616 
.1700 


.1250 
1250 


9 


.6385 
71544 


4775 
5727 


.4027 
4810 


3584 
4251 


3286 
.3870 


3067 


3592 


.2901 
.3378 


.2768 
3207 


.2659 
3067 


2568 
.2950 


.2226 
2514 


1820 


1446 
1521 


LI 
111 


10 


.6020 
£7175 


.4450 
5358 


3733 
.4469 


3311 
.3934 


.3029 
.3572 


.2823 
.3308 


.2666 
.3106 


254] 
.2945 


.2439 
.2813 


.2353 
.2704 


.2032 
.2297 


.1655 
1992. 


1811 


1308 
.1376 


. 1000 
.1000 


12 


5410 
6528 


3924 
4751 


3264 
3919 


.2880 
3428 


.2624 
.3099 


2439 
.2861 


.2299 
.2680 


.2187 
.2535 


.2098 
.2419 


.2020 
.2320 


1737 
.1961 


.1403 
1535 


.1100 
1157 


.0833 
.0833 


15 


4709 
5747 


3346 
4069 


.2758 
3317 


2419 . 
2882 . 


22)95:-: 
2593: .. 


.2034 . 
2386 . 


D1]. 
2228 . 


1815. 
2104 . 


1736 . 


.2002 


1671 
1918 


1429 
1612 


1144 


1251. 


0889 


.0934 . 


.0667 . 
.0667 . 


20 


3894 
4799 


.2705 
3297 


.2205 
.2654 


24 


3434 . 
4247 . 


2094s. 
.2871 . 


.1907 . 
go295 2 


1656 . 
.1970 . 


1493 . 
1759. 


.1374 . 
1608 . 


1286 . 
1495. 


1216 . 
1406 . 


1160 . 
1388 . 


AN3. 
1283 . 


0942 . 
.1060 . 


.0743 
.0810 


.0567 
0595 


.0417 
0417 


.0604 
0658 


0457 
.0480 


0333 
.0333 


.0887 
1033 


.0827 
0957 


.0780 
.0898 


.0745 
.0853 


.0713 
.0816 


0595 
.0668 


0462 
.0503 


.0347 
.0363 


.0250 
.0250 


40 


.2370 . 
.2940 . 


1567 . 
1915. 


A259: 
1508 . 


1082 . 
1281 


.0968 . 
fL135: 4 


60 


1737 
2151 


1131 
1371 


0895 
1069 


0765 


.0902 


0682 
0796 


.0623 
0722 


0583 
.0668 


0552 
0625 


.0520 
0594 


.0497 
0567 


0411 
.0461 


.0316 
0344 


.0234 
0245 


.0167 
0167 


120 


.0998 
225 


.0632 
.0759 


.0495 
0585 


.0419 
.0489 


.0371 
0429 


.0337 
.0387 


0312 
0357 


.0292 
.0334 


.0279 
.0316 


.0266 
.0302 


0218 
0242 


0165 
.0178 


.0120 
0125 


.0083 
.0083 


From C. Eisenhart, M. W. Hastay, and W. A. Wallis, Techniques of Statisti- 
cal Analysis, Chapter 15, pp. 390-391, © 1947 by McGraw-Hill, New York. 
Reprinted by permission. 
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Table XXIV. Random Numbers 


This table gives computer-generated pseudorandom digits that may be drawn 
in any direction: horizontal, left-to-right; vertical, up or down. The numbers 
may be read in single digits, double digits, or digits of any size. For ease of 
reading, the numbers are arranged in groups of five digits which should be 
ignored when reading the table. The table should be employed using a random 
start and crossing out the digits as they are used so that each portion of the table 
is used only once in a given experiment. 
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666 


99S6S 
909CL 
S9C86 
6S¢7C6 
LLLLI 


IL816 
6tLbL 
8hC99 
C6IS8 
DPPEE 


COOLI 
[L760 
LLOOO 
vSO0r 
LIvSl 


vr669 
L969b 
OZL9S 
OL9LI 
CLObL 


Ipppl 
16099 
OChH8 
C6OLE 
C98P9 


88180 
C1987 
OIeee 
Iptis 
£906¢ 


9CLOS 
[8ZcL 
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Chart |. Power Functions of the Two-Sided Student’s t Test 


This chart shows graphs of power functions of the two-sided Student’s ¢ test. 
There are two graphs of power functions corresponding to two levels of signifi- 
cance w = 0.01 and 0.05. Since the distribution of tis symmetrical about zero, 
the two-tailed levels of significance also represent one-tailed values of 0.005 and 
0.025. For each graph, curves are drawn for values of df= 1, 2, 3, 4, 6, 12, 24, 
and 00, the degrees of freedom associated with the variance estimate. The hori- 
zontal scale of the graph represents the noncentrality parameter 6 and the vertical 
scale the corresponding power. For example, to test the hypothesis Ho: u = [Uo 
against the alternative H,: 2 = /41, the statistic t = (X — uo)/(S/./n) is used. 
The distribution of t when Hp is true is the ¢ distribution with df = n — 1 degrees 
of freedom, and the critical region t > t{df, 1 — w] would have a significance 
level a. Now, under Hj, the distribution of t(5) = [(X — ,)+ 60 /./n]/(S/JSn) 
is noncentral t with the noncentrality parameter 6 = (4; — 10)/(o//n). The 
power of the test is given by P{t(d) > t{df, 1 — w]}. For example, for a two- 
tailed test with a = 0.05, df = 6, and 6 = 3, the corresponding power read 
from the graph is approximately 0.70. 
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Chart | (continued) 


a= .05 


Power (in percent) 
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From D. B. Owen, Handbook of Statistical Tables, © 1962 by Addison-Wesley. 


Reprinted by permission. 
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Chart II. Power Functions of the Analysis of Variance F Tests 
(Fixed Effects Model): Pearson-Hartley Charts 


This chart shows a set of graphs of power functions of the analysis of variance 
F tests in a fixed effects analysis of variance model. There are eight graphs 
for eight values of the numerator degrees of freedom v; = 1 (1) 8. For each 
value of v,, there are several values of the denominator degrees of freedom 
v7 =6 (1) 10, 12, 15, 20, 30, 60, oo. Each graph depicts two groups of power 
functions corresponding to two levels of significance, ~w = 0.01 and 0.05. The 
horizontal scale of the graph represents the normalized noncentrality parameter 
(@) and the vertical scale the corresponding power (1 — 8). There are two x- 
scales depending on which level of significance is employed. For example, for 
vy = 2, v2 = 6,a = 0.05, and ¢ = 2, the corresponding power read from the 
graph is approximately 1 — B = 0.66. 
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Chart II (continued ) 
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Chart III. Operating Characteristic Curves for the Analysis of 
Variance F Tests (Random Effects Model) 


This chart shows graphs of curves giving 1 — power for the analysis of variance 
F tests in a random effects analysis of variance model. There are eight graphs 
for eight values of the number of degrees of freedom v; = 1 (1) 8. For each value 
of v,, there are several values of the denominator degrees of freedom v2 = 6 (1) 
10, 12, 15, 20, 30, 60, 00. Each graph depicts two groups of operating character- 
istic Curves corresponding to two levels of significance a = 0.01 and 0.05. The 
horizontal scale of each graph represents the parameter A = ,/1/(1 + no2/o2) 
and the vertical scale the corresponding probability of accepting the hypothesis 
(1 — power). There are two x-scales depending on which level of significance 
is employed. For example, for v;} = 2,v. = 6,@ = 0.01, and A = 7, the 
corresponding power read from the graph is approximately 1 — 0.20 = 0.80 
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Chart III (continued) 
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Chart HII (continued ) 
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Chart Il (continued ) 
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From A. H. Bowker and G. J. Lieberman, Engineering Statistics, 2nd ed., ©1972 by 
Prentice-Hall, Englewood Cliffs, New Jersey. Reprinted by permission. 
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Chart IV. Curves of Constant Power for Determination of Sample 
Size in a One-Way Analysis of Variance (Fixed Effects Model): 
Feldt-Mahmoud Charts 


This chart shows graphs of curves of constant power for the analysis of variance 
F tests ina fixed effects analysis of variance model. The graphs give the values 
of n (y-scale) as a function of ¢’ = + ae a? /r for specified values of the 
number of groups r, the level of significance a, and the power P(1 — 8). Each 
graph depicts two groups of curves corresponding to two levels of significance, 
a =(.05 and 0.01. The graphs are given for r = 2, 3, 4,5; and for each value 
of r, the values of P used in drawing the curves are equal to 0.5, 0.7, 0.8, 0.9, 
and 0.95. There are two x-scales depending on which level of significance is 
employed. For a given set of values of r, a, P, and ¢’, the sample size n may be 
read from the ordinate of the graph. For example, for r = 3, a =0.05, P =0.7, 
and ¢’ = 0.3, the value of n read from the chart is approximately equal to 29. 
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Chart IV (continued) 
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From L. S. Feldt and M. W. Mahmoud, “Power Function Charts for Specifying Number 
of Observations in Analysis of Variance of Fixed Effects,’ Annals of Mathematical 
Statistics, 29 (1958), 871-877. Reprinted by permission. 
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Blocks random | 
treatments fixed and, 492-493 
treatments random and, 492 
BLUE. See Best linear unbiased estimates 
BMD (Biomedical) package, 558 
BMDP (Biomedical Programs), 543. See 
also Statistical computing 
packages 
analysis of variance using, 558-559 
BMDP 3D, 1V, 4V, and 5V, 559 
BMDP 7D, 52, 164, 252, 253, 559, 565, 
567 | 
program and output, 53, 54, 254, 489 
BMDP 2V, 164, 252, 253, 334, 381, 451, 
558, 559 
program and output, 165, 255, 335, 
496, 506, 511 
BMDP 3V, 52, 164, 252, 381, 421, 451, 
559 
program and output, 56, 385, 424 
BMDP 8V, 52, 55, 164, 252, 253, 381, 
421, 451, 559 
program and output, 55, 166, 167, 257, 
259, 336, 338, 382, 383, 386, 422, 
425, 451,519 
Bonferroni inequality, 78 
Bonferroni ¢ statistic, 78-79, table of 
critical values of, 645-647 
Bonferroni’s test/method/interval 
in one-way classification, 78-79, 80 
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in three-way crossed classification, 
321-322 
in two-way crossed classification with 
interaction, 233 
in two-way crossed classification 
without interaction, 150 
in two-way nested classification, 
361-362, 365 
using statistical computing packages, 
561-563 
Boole inequality, 78 
Box-plots, 89 
Brown-Forsythe procedure, 107—108, 
566 
BY keyword, in a SPSS procedure, 
551-552 


C test for homogeneity of variances. See 
Cochran’s C test for homogeneity 
of variances. 

Calculators, electronic, x 

Cells, 126 

Central limit theorem, 368, 569 

Chi-square distribution, definition and 
properties of, 570-571 

table of critical values of, 610-611 
noncentral. See Noncentral chi-square 
distribution 

Chi-square goodness-of-fit test for 
normality, 89-90 

CLASS keyword/statement, in a SAS 
procedure, 524, 546-549 

Classification, term, 125-126 

Cochran’s C test for homogeneity of 
variances, 98, 105—107 

table of critical values of, 664 

Coefficient of kurtosis, 91-92 

table of critical values of sample 
estimate of, 655 

Coefficient of skewness, 90-91 

table of critical values of sample 
estimate of, 655 

Coefficients of order statistics for the 
Shapiro-Wilk’s W test for 
normality, table of, 656 

Column efficiency, of a Latin square 
design, 503 

Completely nested design, definition and 
examples of, 347-348, 395 
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Completely randomized design (CRD), 
485-488, 489 
analysis of variance for, 486—487 
mathematical model of, 486 
worked example for, 487, 488 
Components of variance. See Variance 
components 
Computational formulae and procedure 
for sums of squares 
in Latin square design, 502 
in One-way classification, 35-36 
in partially nested classifications, 437 
in three-way crossed classification, 
297-298 
in three- and four-way nested 
classifications, 396-397, 401, 
403-404 
in two-way crossed classification with 
interaction, 210-212 
in two-way crossed classification 
without interaction, 144-145 
in two-way nested (hierarchical) 
classification, 359, 362-363 
Computing power, use of statistical 
packages for, 560-561 
Concomitant variable, term, 581 
Confidence coefficient, defined, 599 
Confidence intervals for variance 
components, in one-way random 
effects model, 31-34 
Conservative confidence interval, 
defined, 600 
Conservative test, defined, 598 
Consistency, defined, 599 
Consistent estimator, defined, 599 
CONTRAST command/subcomman4d, in 
SAS ANOVA/GLM and SPSS 
ONEWAY/GLM procedures, 
562-565 
Contrasts, defined, 65 
test of hypothesis involving, 67—69 
Control, in experimental design, 
484-485 
Correlation, defined, 588 
Covariance, defined, 587-588 
analysis of, 581-582 
Covariate, term, 581 
CRD. See Completely randomized 
design 
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CRITERIA subcommand, in SPSS GLM 
and VARCOMP procedures, 558, 
561 
Critical range values, Studentized, 80 
Critical region, defined, 597 
Critical values, defined, 597 
of Bartlett’s test for homogeneity of 
variances, table of, 659-662 
of chi-square distribution, table of, 
610-611 
of Cochran’s C test for homogeneity of 
variances, table of, 664 
of D’Agostino’s D test for normality, 
table of, 658 
of Duncan’s multiple range test, table 
of, 642-644 
of Dunn-Siddk’s multiple comparison 
test, table of, 648-651 
of Dunnett’s test, table of, 639-641 
of F distribution, table of, 612—617 
of Hartley’s maximum F ratio test for 
homogeneity of variances, table 
of, 663 
of sample estimate of coefficient of 
kurtosis, table of, 655 
of sample estimate of coefficient of 
skewness, table of, 655 
of Shapiro-Wilk’s W test for 
normality, table of, 657 
of Studentized augmented range 
distribution, table of, 654 
of Studentized maximum modulus 
distribution, table of, 652-653 
of Studentized range distribution, table 
of, 636-638 
of Student’s ¢ distribution, table of, 
608-609 


Cross-nested design, defined, 431 
Cross-over design, 520-521 
Crossed classification, contrasted with a 


nested classification, 347 
term, 125-126 


Cumulative standard normal distribution, 


table of critical values of, 
605-606 


Curves of constant power for F tests in 


fixed effects model (Model I) for 
determination of sample size in a 
one-way classification, 686-688 


728 


D’ Agostino’s D test for normality, 96-97 
table of critical values of, 658 
Degrees of freedom, 2, 569, 570, 
572-576 
concept of, 15-17 
for error variance in a one-way 
classification, 125 
in partially nested classifications, 435 
rules for calculating, 590-591 
DESIGN keyword/subcommand, in 
SPSS GLM and MANOVA 
procedures, 552-557 
Detecting outliers, 97 
Doubly noncentral beta, related to doubly 
noncentral F’, 576 
Doubly noncentral F distribution, 262 
definition and properties of, 576 
Doubly noncentral ¢ distribution, 
definition and properties of, 575 
Duncan’s multiple range test, 81, 
561-563 
table of critical values of, 642-644 
Dunn-Sidak’s multiple comparison test, 
79-80, 562-653 
table of critical values of, 648~651 
Dunnett’s multiple comparison test, 
81-82, 563 
table of critical values of, 639-641 
Dunn’s multiple comparison test, 79 
table of critical values of, 645-647 


Effects parameter, 594 

Efficiency, defined, 599 

Efficient estimator, defined, 599 

Electronic calculators, x 

EMS (error mean square), its 
custom-made use for multiple 
comparisons using statistical 
packages, 561 

Equal variances, departures from, 86-88 

Error mean square (EMS). See EMS 
(error mean square) 

Error sum of squares, 129, 183, 286, 351, 
397, 435, 464, 475, 486, 491, 499, 
508, 513 

Error terms, 5, 589 

departures from independence of, 

88-89 

Error variance, degrees of freedom in 
one-way classification for, 125 
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Estimate, defined, 598 
Estimated total variance, in one-way 
random effects model, 29 
Estimator, defined, 598 
Exact confidence interval, defined, 600 
Exact statistical test, defined, 598 
EXAMINE procedure, in SPSS, 567 
Expectations of mean squares 
in fixed effects models (Model J), 
two-way crossed classification with 
interaction, 186-188 
two-way crossed classification 
without interaction, 131-132 
in mixed models (Model IID), 
in two-way crossed classification 
with interaction, 190-193 
in two-way crossed classification 
without interaction, 133 
in one-way Classification, 17-20 
in random effects model (Model IT) 
in two-way crossed classification 
with interaction, 188-190 
in two-way crossed classification 
without interaction, 132-133 
in three-way crossed classification, 286 
in two-way crossed classification with 
interaction, 184-193 
in two-way crossed classification 
without interaction, 130-134 
in various other designs or models. See 
under a specific design or model 
Expected mean squares, rules for finding, 
591-595 
Expected subclass numbers, method of, 
222 
Expected value, defined, 586-587 
Experimental data, collecting, 483 
Experimental designs, 
in agriculture and other sciences, 4 
literature on, 485 
principles of, 483-485 
some simple, 483-542 
Experimental unit, defined, 484 


F distribution, 23, 138, 198, 199, 201, 
202, definition and properties of, 
572-573 

table of critical values of, 612—617 
doubly noncentral, 262, definition and 
properties of, 576 
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F distribution (cont.) 
noncentral. See Noncentral 
F distribution 
F test(s) 
equivalent to paired ¢ test, 584-586 
equivalent to two-sample ¢ test, in 
one-way classification, 582-584 
in fixed effects model (Model I) 
curves of constant power for 
determination of sample size in 
one-way classification, 686—688 
in three-way crossed classification, 
288, 289 
in two-way crossed classification 
with interaction, 198-200 
in two-way crossed classification 
without interaction, 137-139 
power function charts of, 672-680 
in mixed model (Model III) 
in three-way crossed classification, 
291-292 
in two-way crossed classification 
with interaction, 202-203 
in two-way crossed classification 
without interaction, 139 
in one-way classification, 22—25 
in random effects model (Model IT) 
in one-way classification, table of 
power and optimum number of 
levels in, 630-633 
in three-way crossed classification, 
288-291, 292 
in two-way crossed classification 
with interaction, 200-202 
in two-way crossed classification 
without interaction, 139 
operating characteristic curves for, 
61, 681-685 
in three-way crossed classification, 
286, 288-292 
in two-way crossed classification with 
interaction, 197-203 
in two-way crossed classification 
without interaction, 137-139 
in two-way crossed finite population 
model, 466-467 
in two-way nested (hierarchical) 
classification, 351, 353, 363-365 
in various other designs or models. See 
under a specific design or model 
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power of. See Power of F test 
F value in an analysis of variance 
table, 26 
Feldt-Mahmoud charts, 62, 686-688 
Finite population models, 461-482 
four-way crossed, 472—473 
more complex, 481 
nested, 474-475 
one-way, 461-462 
Statistical computing packages in, 481 
three-way crossed, 470-472 
two-way crossed, 462—470 
unablanced, 475 
worked example for, 475-481 
Finite population theory, 8 
Finite populations, 7-8, 461 
Fisher’s Z distribution, related to F 
distribution, 572 
Fixed effects, 4, 5, concept of, 6—7 
Fixed effects analysis, in an unbalanced 
two-way crossed classification 
model, 215-223 
general case of unequal frequencies 
for, 217-223 
proportional frequencies for, 
215-217 
Fixed effects model (Model I), 4, 5, 
481 
curves of constant power for F tests for 
determination of sample size in 
one-way Classification, 686-688 
effects of violations of assumptions of 
two-way crossed classification 
with interaction, 269 
expectations of mean squares in, 
in two-way crossed classification 
with interaction, 186-188 
in two-way crossed classification 
without interaction, 131—132 
F tests 
in one-way classification, 22—24 
in three-way crossed classification, 
288, 289 
in two-way crossed classification 
with interaction, 198-200 
in two-way crossed classification 
without interaction, 137-139 
in various other designs or models. 
See under a specific design or 
model 
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Fixed effects model (Model I) (cont.) 
interval estimation 
in two-way crossed classification 
with interaction, 207-208 
in two-way crossed classification 
without interaction, 142-143 
in two-way nested (hierarchical) 
classification, 356-357 
one-way classification, table of 
minimum sample size per 
treatment group needed in, 
634-635 
point estimation in 
in two-way crossed classification 
with interaction, 203—205 
in two-way crossed classification 
without interaction, 
139-14] 
in two-way nested (hierarchical) 
classification, 354-355 
in various other designs or models. 
See under a specific design or 
model 
power function charts of F tests in, 
672-680 
power of F test in 
in one-way classification, 57-60 
in two-way crossed classification 
with interaction, 227-228 
in various other designs or models. 
See under a specific design or 
model 
sampling distribution of mean squares 
in 
in two-way crossed classification 
with interaction, 193, 195-196 
in two-way crossed classification 
without interaction, 135-136 
in various other designs or models. 
See under a specific design or 
model 
worked examples for 
in One-way classification, 39—43 
in three-way crossed classification, 
314-322 
in two-way crossed classification 
with interaction, 233-237 
for unequal sample sizes per 
cell, 237-244 
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in two-way crossed classification 
without interaction, 151-154 
in two-way nested (hierarchical) 
classification, 368-371 
in various other design or models. 
See under a specific design or 
model 
Four-factor partially nested classification, 
438-439, 440 
Four-way crossed classification, 302, 
304-307 
Four-way crossed finite population 
model, 472—473 
Four-way nested classification, 403—406 
Fox charts, 58 
Fractional replications, 523-524 


General case of unequal frequencies 
for fixed effects analysis, 217-223 
for random effects analysis, 223-226 
General constant, in a linear model, 5 
General linear models (GLM), 5, 8 
General g-way nested classification, 
406-407 

Generalized linear models, 8 

Generalized randomized block design 
(GRBD), 494 

GLM (General linear models), 5, 8 

GLM procedure, in SPSS, 52, 164, 252, 
253, 333, 334, 381, 421, 451, 
551-554, 556, 557 

program and output, 55, 56, 166, 167, 

256, 258, 336, 337, 383, 384, 386, 
422, 423, 425, 450, 518 

Graeco-Latin square design, 507-512 
analysis of variance for, 507-509 
mathematical model for, 507 
worked example in, 510-512 

Graeco-Latin squares, 507 
some more examples of, 602-603 

GRBD (generalized randomized block 

design), 494 
Grouping experimental units, 484 


Hartley’s maximum F ratio test for 
homogeneity of variances, 98, 
104-105, 106-107 

table of critical values of, 663 

Hierarchical models, partially, 431-460 
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Hierarchically nested design. See 
completely nested design 
Higher-order crossed classifications, 
307-311 
Homogeneity of variances 
Bartlett’s test for, 98-104, 106-107 
table of critical values of, 659-662 
Cochran’s C test for, 98, 105-107 
table of critical values of, 664 
Hartley’s maximum F ratio test for, 
98, 104-105, 106-107 
table of critical values of, 663 
other tests for, 107-108 
HOMOGENEITY option, in SPSS 
procedures, 566 
Homoscedasticity 
tests for, 97-108. See also 
Homogeneity of variances 
entries 
use of statistical packages for, 
565-567 
transformations to correct lack of, 
110-113 
HOVTEST option, in SAS GLM 
procedure, 566 
Hyper-Graeco-Latin square design, 522 
Hypersquares, 522 
Hypothesis testing, general procedure of, 
597-598 


Incomplete block design, 516, 519 
Independence of error terms, departures 
from, 88-89 
Independent variable, term, in analysis of 
covariance, 581 
Infinite population theory, 461 
Infinite populations, 7-8, 461 
Interaction, defined, 177 
meaning and interpretation of, 253, 
256-259 
with one observation per cell, 
259, 261-264 
significant, 257 
two-way crossed classification with. 
See Two-way crossed 
classification with interaction 
two-way crossed classification 
without. See Two-way crossed 
classification without interaction 
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Interaction effect, in a two-way crossed 
classification model, 180 
Interaction sum of squares, in a two-way 
crossed classification model, 183 
Interaction terms, in an analysis of 
variance model, 589 
Interval estimation, general method of, 
599-601 
in fixed effects model (Model I) 
in two-way crossed classification 
with interaction, 207—208 
in two-way crossed classification 
without interaction, 142-143 
in two-way nested (hierarchical) 
classification, 356-357 
in various other designs or models. 
See under a specific design or 
model 
in Latin square design, 500-501 
in mixed model (Model IID) 
in two-way crossed classification 
with interaction, 209-210 
in two-way crossed classification 
without interaction, 143-144 
in two-way nested (hierarchical) 
classification, 359 
in various other designs or models. 
See under a specific design or 
model 
in random effects model (Model II) 
in two-way crossed classification 
with interaction, 208-209 
in two-way crossed classification 
without interaction, 143 
in two-way nested (hierarchical) 
classification, 358 
in various other designs or models. 
See under a specific design or 
model 
in three-way crossed classification, 
292-297 
in two-way crossed classification with 
interaction, 207—210 
in two-way crossed classification 
without interaction, 142-144 
in two-way crossed finite population 
model, 468—470 
in two-way nested (hierarchical) 
classification, 356-359, 365-367 
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Interval estimation (cont.) 
in various other designs or models. See 
under a specific design or model 
Intraclass correlation, definition and 
properties of, 580-581 
Intraclass correlation coefficient, defined, 
581 
intraclass correlations 
in one-way classification, 29, 30, 34, 
38 
in two-way crossed classification with 
interaction, 182 
in two-way crossed classification 
without interaction, 128 
Inverse hyperbolic sine transformation, 
110n 


Jackknife technique, in finding 
confidence interval of a variance 
component 33n, in testing 
homogeneity of variances, 
107-108 


Kruskal-Wallis test, 87 
K test, 87, 269 
Kurtosis, 85, 91 
coefficient of. See Coefficient of 
kurtosis 
test for, 91-93 


Lagrangian interpolation, three-point, 
to calculate power of an F test, 621 
Latin square design, 495, 497-507 
analysis of variance for, 498-500 
computational formulae and procedure 
for sums of squares in, 502 
interval estimation in, 500-501 
mathematical model of, 498 
missing observations in, 502—503 
multiple comparisons in, 501 
point estimation in, 500-501 
power of F test in, 501 
relative efficiency of, 503-504 
replications in, 504—505 
worked example in, 505-507 
Latin squares, 495, 497 
some more examples of, 601—602 
Lattice design, 519-520 
Least significant difference test, 77—78 
Least squares estimators 
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in two-way crossed classification with 
interaction, 203 
in two-way crossed classification 
without interaction, 139 
in two-way nested classification, 354 
Level of confidence, definition and 
interpretation of, 599 
Levene’s test for homogeneity of 
variances, 107, 566-567 
Liberal interval, defined, 600 
Liberal test, defined, 598 
Linear combination of means, defined, 65 
Linear (statistical) models, defined, 8 
Logarithmic transformation, 109, 111 
Lower confidence interval, defined, 600 
LSD (least significance difference), 
77-718 


Magic Latin squares, 522 
Main effects, in two-way crossed 
classification with interaction, 
180 
MANOVA procedure, in SPSS, 164, 252, 
253, 334, 381, 451, 551-555 
program and output, 254, 255, 334, 
382, 496, 506, 511 
Mathematical expectation, defined, 
586-587 
Maximum likelihood estimators 
in one-way classification, 28 
in two-way crossed classification with 
interaction, 203n, 206n 
in two-way crossed classification 
without interaction, 139n, 141 
in two-way nested classification, 356 
Maximum test for scale, for homogeneity 
of variances, 108 
MAXORDERS subcommand, in SPSS 
ANOVA procedure, 553 
Mean squares, defined, 2 
expectations of. See Expectations of 
mean squares 
expected, rules for finding, 591-595 
sampling distribution of. See Sampling 
distribution of mean squares 
Mean value, defined, 587 
Means 
linear combination of, defined, 65 
range of the set of, defined, 80 
MEANS procedure, in SPSS, 550 
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MEANS statement, in SAS GLM 
procedure, 52, 562, 565-566 
Method of expected subclass numbers, 222 
Method of unweighted means, 217-220 
Method of weighted-squares-of-means, 
220-222 
METHOD subcommand, in SPSS 
VARCOMP procedure, 557-558 
Minimum norm quadratic unbiased 
estimation (MINQUE), 224 
Minimum sample size per treatment 
group needed in one-way fixed 
effects design, table of, 634-635 
Minimum variance quadratic unbiased 
estimators (MIVQUE), 224 
Minimum variance unbiased (MVU) 
estimator, defined, 599 
MINQUE (minimum norm quadratic 
unbiased estimation), 224 
Missing observations or values 
in Latin square design, 502-503 
in split-plot design, 514, 516 
in two-way crossed classification 
without interaction, 145-148 
worked example for, in two-way 
crossed classification without 
interaction, 161-164 
MIVQUE (minimum variance quadratic 
unbiased estimators), 224 
Mixed-classification design, defined, 431 
Mixed effects analysis, in an unbalanced 
two-way crossed classification 
model, 226-227 
Mixed model (Model IID), 4, 5, 481 
alternate, 264-268 
effects of violations of assumptions of 
two-way crossed classification 
with interaction, 270 
expectations of mean squares in 
in two-way crossed classification 
with interaction, 190-193 
in two-way crossed classification 
without interaction, 133 
F tests in 
in three-way crossed classification, 
291-292 
in two-way crossed classification 
with interaction, 202-203 
in two-way crossed classification 
without interaction, 139 
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in various other designs or models. 
See under a specific design or 
model : 
interval estimation in 
in two-way crossed classification 
with interaction, 209-210 
in two-way crossed classification 
without interaction, 
143-144 
in two-way nested (hierarchical) 
classification, 359, 367 
in various other designs or models. 
See under a specific design or 
model 
point estimation in 
in two-way crossed classification 
with interaction, 206-207 
in two-way crossed classification 
without interaction, 141 
in two-way nested (hierarchical) 
classification, 356, 357 
in various other designs or models. 
See under a specific design or 
model 
power of F test in, in two-way crossed 
classification with interaction, 
229-230 | 
in various other designs or models. 
See under a specific design or 
model 
sampling distribution of mean 
Squares in 
in two-way crossed classification 
with interaction, 196-197 
in two-way crossed classification 
without interaction, 136-137 
Scheffé’s, 267-268, 285n 
worked examples for 
in partially nested classifications, 
444-448, 449 
in three-way crossed classification, 
328-333 
in three-way nested classification, 
417-420 
in two-way crossed classification 
with interaction, 247—252 
in two-way crossed classification 
without interaction, 157-161 
in two-way nested (hierarchical) 
classification, 378-380 
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Model I. See Fixed effects model 
(Model I) 
Model II. See Random effects model 
(Model IT) 
Model III. See Mixed model (Model III) 
MODEL keyword/statement, in SAS 
procedures, 524-525, 546-549 
Modified sequentially rejective 
Bonferroni (MSRB) test, 79 
MSRB (modified sequentially rejective 
Bonferroni) test, 79 
Multifactor layouts, 281 
Multiple comparisons 
in Latin square design, 501 
in one-way classification, 64-84 
Bonferroni’s test for. See 
Bonferroni’s test/method interval 
Dunn-Sidak test for, 79-80 
Dunnett’s test for, 81-82 
Dunn’s procedure of, 78-79 
least significant difference test for, 
77-718 
MSRB test for, 79 
Newman-Keul’s test for, 80-81 
Scheffé’s method of. See Scheffé’s 
method of multiple comparison 
SRB test for, 79 
Tukey’s method of. See Tukey’s 
method of multiple comparison 
Various other methods of, 82-84 
in three-way crossed classification, 
299-301 
in two-way crossed classification with 
interaction, 230-233 
in two-way crossed classification 
without interaction, 149-151 
in two-way nested (hierarchical) 
classification, 360-362 


use of statistical packages for, 561-565 


Multiple regression analysis, method 
based on, 222-223 
Multivariate response, analysis of 
variance (ANOVA) models with, 9 
Mutual orthogonality of contrasts, 66 
MVU (minimum variance unbiased) 
estimator, defined, 599 


Negative estimates of variance 
components, 28, 141, 206, 356 
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Nested classifications 
four-way, 403-406 
general g-way, 406-407 
partially. See Partially nested 
classifications 
three-way. See Three-way nested 
classification 
Nested design, completely or 
hierarchically, 348-349 
Nested-factorial design, defined, 431 
Nested finite population models, 
474—475 
two-way, 474-475 
Newman-Keul’s test, 80-81, 561-563, 
565 
Nonadditivity, sum of squares for, 262 
Tukey’s one degree of freedom test for, 
262-264, 503 
Noncentral chi-square distribution, 21, 
135, 195, 197, definition and 
properties of, 573-574 
Noncentral F distribution, 59, 60, 138, 
139, 148, 227, 229, 262, 
definition and properties of, 
575-576 
Noncentral ¢ distribution, 251, 618, 669, 
definition and properties of, 
574-575 
Noncentrality parameter, 21, 57-59, 135, 
136, 138, 195, 251, 298, 618, 669, 
defined, 573 
Nonnegative maximum likelihood 
estimators of variance 
components, 28n, 141n, 206n, 
356n 
Nonparametric analysis of variance, 9 
Normality 
assumption of, 20, 135, 193, 286, 351, 
399, 407 
chi-square goodness of-fit test for, 
89-90 
D’Agostino’s D test for. See 
D’ Agostino’s D test for normality 
effects of departures from assumption 
of, 85-86 
Shapiro-Francia’s test for, 94-96 
Shapiro-Wilk’s W test for. See 
Shapiro-Wilk’s W test for 
normality 
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Normality (cont.) 
tests for, 89-97 
use of statistical packages for tests of, 
567 
NPAR TESTS procedure, in SPSS, 567 
Null hypothesis, defined, 597 


O’Brien procedure for testing 
homogeneity of variances, 108 
Observations, physical, variation among, 
l 
One observation per cell, three-way 
classification with, 301-302, 303 
One-sided confidence interval, defined, 
600 
One-way classification, 11-123 
advantages and disadvantages of, 125 
assumptions of, 11-12 | 
computational formulae and procedure 
for sums of squares in, 35-36 
confidence intervals for variance 
components in, 31-34 
corrections for departures from 
assumptions of, 108-113 
effects of departures from assumptions 
underlying, 84-89 
F test equivalent to two-sample f¢ test 
in, 582-584 
F tests in, 22-25 
mathematical model of, 11 
point estimation in, 26-31 
power of F test 
in fixed effects model (Model I), 
57-60 
in random effects model (Model II), 
60-61 
statistical computing packages in, 52 
worked examples using, 52, 53-56 
tests for departures from assumptions 
of, 89-108 
One-way finite population model, 
461-462 
One-way fixed effects design, table of 
minimum sample size per 
treatment group needed in, 
634-635 
One-way random effects design, table of 
power and optimum number of 
levels in, 630-633 
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ONEWAY procedure, in SPSS, 52, 

550-552 
program and output, 53, 54, 489 

Operating characteristic curves for F 
tests in random effects model 
(Model IT), 61, 681-685 

Orthogonal contrasts, defined, 65 

Outliers, detecting, 97 


p-value, defined, 23, in hypothesis 
testing, 597 
Paired ¢ test, F test equivalent to, 
584-586 
Parameter, defined, 598 
Partially hierarchical models, 434 
Partially nested classifications, 431-460 
analysis of variance for, 433-436 
computational formulae and procedure 
for sums of squares in, 437 
degrees of freedom in, rule for finding, 
435 
four-factor, 438-439 
mathematical model of, 431-433 
statistical computing packages in, 448, 
worked example using, 450-451 
Partition of the total sum of squares 
in one-way classification, 14—15 
in three-way crossed classification, 
285-286 
in two-way crossed classification with 
interaction, 182-183 
in two-way crossed classification 
without interaction, 128-129 
Pearson-Hartley charts, 58, 59, 61, 148, 
251, 672-680 
Percentage points of the standard normal 
distribution, table of, 607 
Percentiles of the chi-square distribution, 
table of, 610-611 
Physical observations, variation among, 1 
Point estimation, general method of, 
598-599 
in fixed effects model (Model I) 
in two-way crossed classification 
with interaction, 203-205 
in two-way crossed classification 
without interaction, 139-141 
in two-way nested (hierarchical) 
classification, 354-355 
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Point estimation (cont. ) 
in Latin square design, 500-501 
in mixed model (Model III) 
in two-way crossed classification 
with interaction, 206—207 
in two-way crossed classification 
without interaction, 14] 
in two-way nested (hierarchical) 
classification, 356, 357 
in one-way classification, 26-31 
in random effects model (Model II) 
in two-way crossed classification 
with interaction, 205—206 
in two-way crossed classification 
without interaction, 141 
in two-way nested (hierarchical) 
classification, 355-356 
in three-way crossed classification, 
292-297 
in two-way crossed classification with 
interaction, 203-207 
in two-way crossed classification 
without interaction, 139-141 
in two-way crossed finite population 
model, 467-468 
in two-way nested (hierarchical) 
classification, 354-356, 365-366 
Population variances, multiple 
comparison for unequal, 84 
Populations, finite and infinite, 7-8 
POSTHOC subcommand, in SPSS 
ONEWAY and GLM procedures, 
563-564 
Power and optimum number of levels in 
one-way random effects F test, 
table of, 630-633 
Power function charts 
of F test in fixed effects model 
(Model I), 672-680 
of two-sided Student’s ¢ test, 
669-671 
Power of a test, defined, 597 
Power of F test 
in fixed effects model (Model I), table 
of, 621-629 
in Latin square design, 501 
in one-way classification, 52, 57-61 
in random effects model (Model II), 
table of optimum number of 
levels and, 630-633 


Subject Index 


in three-way crossed classification, 
298-299 
in two-way crossed classification with 
interaction, 227-230 
in fixed effects model (Model I), 
227-228 
in mixed model (Model III), 
229-230 
in random effects model (Model II), 
228-229 
in two-way crossed classification 
without interaction, 148-149 
in two-way nested (hierarchical) 
classification, 360 
Power of Student’s ¢ test, table of, 
618-620 
POWER subcommand, in SPSS 
MANOVA procedure, 560 
Power transformation, 112-113 
PRINT subcommand, in SPSS 
VARCOMP procedure, 558 
PROC ANOVA, in SAS, 52, 164, 252, 
253, 333, 334, 381, 448, 524, 
544-547, 549 
program and output, 53, 54, 165, 254, 
334, 335, 489, 496, 506, 511 
PROC GLM,, in SAS, 52, 164, 252, 253, 
333, 334, 381, 421, 448, 451, 524, 
544, 549 
program and output, 55, 56, 166, 167, 
255, 256, 258, 337, 382, 384, 385, 
421, 423, 424, 450, 518 
PROC LATTICE, in SAS, 520 
PROC MIXED, in SAS, 164, 252, 333, 
381, 448, 525, 545, 549 
program and output, 604 
worked examples using, 603-604 
PROC MULTTEST, in SAS, 562 
PROC NESTED, in SAS, 381, 448, 544, 
545, 547, 548 
program and output, 383 
PROC UNIVARIATE, in SAS, 52, 
567 
PROC VARCOMf, in SAS, 164, 252, 
333, 381, 448, 545, 549, 557 
Proportional frequencies in an 
unbalanced three-way crossed 
classification, 311-314 
unbalanced two-way crossed 
classification, 
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Proportional frequencies (cont. ) 
for fixed effects analysis, 215-217 
for random effects analysis, 223 
Protected least significant difference, 
77-78 
Pseudo-F test 
in higher order crossed classifications, 
308 
in partially nested classifications, 
442-443 
in three-way crossed classification, 
290-292, 293, 327 
in two-way crossed finite population 
model, 466-467, 477-479 
use of Satterthwaite procedure 
in constructing, 578-580 


Quasi-factorials, 519 


Random effects, 4, 5, concept of, 6-7 
Random effects analysis, in an 
unbalanced two-way crossed 
classification model, 223-226 
general case of unequal frequencies 
for, 223-226 
proportional frequencies for, 223 
Random effects model (Model ID, 4, 5, 
481 
effects of violations of assumptions of 
two-way crossed classification 
with interaction, 269 
expectations of mean squares in 
in two-way crossed classification 
with interaction, 188-190 
in two-way crossed classification 
without interaction, 132-133 
F tests in 
in one-way Classification, 24-25 
in three-way crossed classification, 
288-291] 
in two-way crossed classification 
with interaction, 200—202 
in two-way crossed classification 
without interaction, 139 
in various other designs or models. 
See under a specific design or 
model 
interval estimation in 
in two-way crossed classification 
with interaction, 208-209 
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in two-way crossed classification 
without interaction, 143 
in two-way nested (hierarchical) 
classification, 358 
in various other designs or models. 
See under a specific design or 
model 
operating characteristic curves for F 
tests in, charts of, 61, 681-685 
point estimation in 
in two-way crossed classification 
with interaction, 205-206 
in two-way crossed classification 
without interaction, 141 
in two-way nested (hierarchical) 
classification, 355-356 
in various other designs or models. 
See under a specific design or 
model 
power and optimum number of levels 
of F test in one-way, table of, 
630-633 
power of F test in 
in one-way classification, 60-61 
in two-way crossed classification 
with interaction, 228-229 
in various other designs or models. 
See under a specific design or 
model 
sampling distribution of mean 
squares in 
in two-way crossed classification 
with interaction, 196 
in two-way crossed classification 
without interaction, 136 
in various other designs or models. 
See under a specific design or 
model 
worked examples for 
in one-way classification, 43—52 
in partially nested classifications, 
439-443 
in three-way crossed classification, 
322-328 
in three-way nested classification, 
407-411 
in two-way crossed classification 
with interaction, 244-247 
in two-way crossed classification 
without interaction, 154-157 
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Random effects model (Model II) (cont.) 
in two-way nested (hierarchical) 
classification, 371-374 
for unequal numbers in subclasses 
in three-way nested classification, 
411-417 
in two-way nested (hierarchical) 
classification, 374-378 
RANDOM keyword/statement/ 
subcommand, in SAS and SPSS 
GLM procedures, 252, 333, 381, 
448, 525, 546, 547-548, 556 
Random numbers, table of, 665-668 
Random sample, defined, 595-596 
Randomization, in experimental design, 
483-484, 488, 493, 497 
Randomized block design (RBD), 488, 
490-495, 496 
analysis of variance for, 490-491 
generalized (GRBD), 494 
mathematical model of, 490 
missing observations in, 493 
relative efficiency of, 493-494 
replications in, 494 
worked example for, 494-495 
Randomized design, completely. See 
Completely randomized design 
Randomness, concept of, 596 
Range of the set of means, defined, 80 
RANGES subcommand, in SPSS 
ONEWAY procedure, 562-563 
RBD. See Randomized block design 
RE. See Relative efficiency 
Reciprocal transformation, 111-112 
Relative efficiency (RE), defined, 493n 


of randomized block design, 493-494 — 


of Latin square design, 503-504 
Reliability, of an estimate, 599 
REML, option for computing restricted 
maximum likelihood estimates in 
SAS PROC MIXED and SPSS 
VARCOMP procedures, 549, 557 
Repeated measures design, 521-522 
Replications, in experimental design, 
483 
in Latin square design, 504 
in randomized block design, 494 
Representative sample, concept of, 
595 
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Restricted maximum likelihood. See 
REML 

Row efficiency, of a Latin square design, 
504 


Sample size, power and determination of, 
61-64 
Sample size determination using smallest 
detectable difference, 63-64 
Sample, defined, 595 
Sampling distribution, defined, 596 
Sampling distribution of mean squares 
in fixed effects model (Model I) 
in two-way crossed classification 
with interaction, 193, 195-196 
in two-way crossed classification 
without interaction, 135-136 
in mixed model (Model II) 
in two-way crossed classification 
with interaction, 196-197 
in two-way crossed classification 
without interaction, 136—137 
in one-way classification, 20-22 
in random effects model (Model II) 
in two-way crossed classification 
with interaction, 196 
in two-way crossed classification 
without interaction, 136 
in two-way crossed classification with 
interaction, 193—197 
in two-way crossed classification 
without interaction, 135-137 
SAS (Statistical Analysis System), 
543-550. See also Statistical 
computing packages and entries 
following PROC 
SAS PROBMC function, 82 
Satterthwaite procedure, 144, 209, 210, 
365, 367, 403, 468, 578-580 
Scheffé’s mixed model, 267-268, 285n 
Scheffé’s method of multiple comparison 
in Latin square design, 501 
in one-way Classification, 73-76 
effects of departures from 
assumptions in, 86, 87 
interpretation of, 76-77 
relative merits and drawbacks of, 77 
in three-way crossed classification, 
300-301 


Subject Index 


Scheffé’s method (cont.) 
in two-way crossed classification with 
interaction, 230, 231, 232, 233, 
251 
in two-way crossed classification 
without interaction, 150, 
153-154, 160-161 
in two-way nested (hierarchical) 
classification, 361, 365 
using statistical computing packages, 
561-565 
Sequentially rejective Bonferroni (SRB) 
procedure, 79 
Shapiro-Francia’s test for normality, 
94-96 
Shapiro-Wilk’s W test for normality, 
93-94, 102-103 
table of coefficients of order statistics 
for, 656 
table of critical values of, 657 
Significance level, concept of, 597 
Skewness, 85, 90 
coefficient of. See Coefficient of 
skewness 
test for, 90-91 
Smallest detectable difference, sample 
size determination using, 
63-64 
Snedecor’s F distribution. See F 
distribution 
SPECIAL keyword, in SPSS GLM 
procedure, 564 
Split-plot design, 512-516 
analysis of variance for, 513-515 
mathematical model of, 513 
missing values in, 514, 516 
worked example for, 516, 517-519 
Split-split-plot design, 523 
SPSS, 543, 550-558. See also Statistical 
computing packages and entries 
following a particular procedure 
Square-root transformation, 109-110, 
111 
Square transformation, 112 
SRB (sequentially rejective Bonferroni) 
procedure, 79 
Standard deviation, defined, 587 
Standard error of an estimator, defined, 
599n 
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Standard normal distribution 
cumulative, table of, 605-606 
percentage points of, table of, 607 

Statistic, term, 596 

Statistical Analysis System. See SAS 

Statistical computing packages, 543 
analysis of variance using, 543-567 
in finite population models, 481 
in one-way Classification, 52 

worked examples using, 52, 53-56 
in partially nested classifications, 448 
worked example using, 450-45 1 
in three-way crossed classification, 
333-334 
worked examples using, 334-338 
in three-way nested classifications, 421 
worked examples using, 421—425 
in two-way crossed classification with 
interaction, 252 
worked examples using, 253, 
254-256, 257, 258, 259 
in two-way crossed classification 
without interaction, 164 
worked examples using, 164-167 
in two-way nested (hierarchical) 
classification, 381 
worked examples using, 381-386 
in various other designs or models. See 
under a specific design or model 
Statistical inference, methods of, 
596-601 

Statistical packages, use of 
for computing power, 560-561 
for performing multiple comparisons, 

561-565 
for performing tests of 

homoscedasticity, 565-567 

Statistical Product and Service Solutions, 
543 

Statistical tables and charts, 605-688. 
See also entries under a specific 
table or chart | 

Studentized augmented range distribution 

83, table of critical values of, 654 
Studentized critical range values, 80 
Studentized maximum modulus 

distribution, 84 

definition and properties of, 577-578 
table of critical values of, 652-653 
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Studentized range distribution, 71, 80 
definition and properties of, 576-577 
table of critical values of, 636—638 

Student’s ¢ distribution. See t distribution 

Student’s ¢ test. See t test 

Sum(s) of squares, defined, 2 
for Tukey’s test of nonadditivity, 262 
rules for calculating, 590-591 
Types I, I, II, and IV, 545 

Super magic Latin squares, 522 

Systemic effects, 4. See also entries 

under Fixed effects 


t distribution, 68, 74, 76, 618, 669, 
definition and properties of, 
569-570 

doubly noncentral. See Doubly 
noncentral ¢ distribution 

noncentral. See Noncentral t 
distribution 

table of critical values of, 608-609 

t test, 68, 69, 78, 252, 618, 669 

paired, F test equivalent to, 584-586 

power function charts of the two-sided, 
669-671 

two-sample, F test equivalent to, in 
one-way classification, 582-584 

TEST statement/option/subcommand, in 
SAS and SPSS GLM procedures, 
252, 333, 381, 448, 547-548, 
556-557 

Test statistic, defined, 597 

Three- and higher-order crossed 
classifications, 281-345, unequal 
sample sizes in, 311-314 

Three-way classification with one 
observation per cell, 301-302, 
303 

Three-way crossed classification, 
281-302, 311-345 

assumptions of, 284—285 
computational formulae and procedure 
for sums of squares in, 297—298 
expectations of mean squares in, 286 
F tests in, 286, 288-292 
interval estimation in, 292—297 
mathematical model of, 281-283 
multiple comparisons in, 299-301 
partition of the total sum of squares in, 
285-286 
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point estimation in, 292-297 
power of F test in, 298-299 
statistical computing packages in, 
333-334 
worked examples using, 334—338 
Three-way crossed finite population 
model, 470-472 
Three-way nested classification, 395-429 
analysis of variance of, 396—399 
mathematical model of, 395-396 
statistical computing packages in, 421 
worked examples using, 421—425 
tests of hypotheses and estimation in, 
399-400 
unequal numbers in subclasses in, 
400-403 
worked example for, 411-417 
Transformations, 109-113 
to correct lack of normality, 109-110 
to correct lack of homoscedasticity, 
110-113 
Treatments fixed 
blocks fixed and, 490-492 
blocks random and, 492-493 
Treatments random 
blocks fixed and, 493 
blocks random and, 492 
Tukey-Kramer intervals, 82 
Tukey-Kramer-Miller-Winer procedure, 
82 
Tukey’s method of multiple comparison 
in Latin square design, 501 
in one-way classification, 70—72 
effects of departures from 
assumptions in, 86 
interpretation of, 76-77 
relative merits and drawbacks of, 
77 
in three-way crossed classification, 
300-301 
in two-way crossed classification with 
interaction, 230, 231, 232, 233, 
250 
in two-way crossed classification 
without interaction, 150, 
152-153, 159-160 
in two-way nested (hierarchical) 
classification, 361 
using statistical computing packages, 
561-563, 565 
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Tukey’s one degree of freedom test for 
nonadditivity, 262—264, 503 
Two-sample ¢ test, F test equivalent to, in 
one-way classification, 582-584 
Two-sided confidence interval, defined, 
600 
Two-sided Student’s ¢ test, power 
function charts of, 669-671 
Two-stage nested design, 13 
Two-way crossed classification, defined, 
125-126 
Two-way crossed classification with 
interaction, 177—280 
assumptions of, 180-182 
best linear unbiased estimation 
(BLUE) in, 204 
computational formulae and procedure 
for sums of squares in, 210-212 
effects of violations of assumptions of, 
268-270 
expectations of mean squares in, 
184-193 
F tests in, 197-203 
interval estimation in, 207-210 
mathematical model of, 177-180 
multiple comparisons in, 230-233 
partition of the total sum of squares in, 
182-183 
point estimation in, 203-207 
power of F test in, 227-230 
sampling distribution of mean squares 
in, 193, 195-197 
statistical computing packages in, 252 
worked examples using, 253, 
254—256, 257, 258, 259 
with unequal sample sizes per cell, 
212-227 
worked example for, 237—244 
Two-way crossed classification without 
interaction, 125-175 
assumptions of, 127-128 
best linear unbiased estimates (BLUE) 
in, 140 
computational formulae and procedure 
for sums of squares in, 144-145 
effects of violations of assumptions of, 
168 
expectations of mean squares in, 
130-134 
F tests in, 137-139 
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interval estimation in, 142-144 
mathematical model of, 126 
missing observations in, 145-148 
multiple comparisons in, 149-150 
partition of the total sum of squares in, 
128-129 
point estimation in, 139-141 
power of F test in, 148-149 
sampling distribution of mean squares 
in, 135-137 
statistical computing packages in, 164 
worked examples using, 164—167 
Two-way crossed finite population 
model, 462-470 
F tests in, 466-467 
interval estimation in, 468-470 
point estimation in, 467—468 
Two-way nested (hierarchical) 
classification, 347-394 
analysis of variance of, 350-351 
assumptions of, 350 
computational formulae and procedure 
for sums of squares in, 359, 
362-363 
F tests in, 351, 353, 363, 365 
interval estimation in, 356-359, 
365-367 
mathematical modei of, 349-350 
multiple comparisons in, 360-362 
point estimation in, 354-356, 365-366 
power of F test in, 360 
Statistical computing packages in, 381 
worked examples using, 381-386 
unequal numbers in subclasses in, 
362-368 
worked example for, 374-378 
Two-way nested finite population model, 
474-475 
2? design, 523-524 
Type I error, defined, 597 
Type II error, 36, defined, 597 
Types I, II, II, and IV sums of squares, 
252, 545 


UMA (uniformly most accurate) interval, 
defined, 600 

UMAU (uniformly most accurate 
unbiased) interval, defined, 601 

UMVU estimator, of the ratio of two 
variance components, 29 
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Unbalanced finite population models, 
475 
Unbiased confidence interval, defined, 
600 
Unbiased estimator, definition and 
example of, 598-599 
Unbiasedness, 598, 600 
Uncertainty, of an estimate, 599 
Unequal numbers of observations 
in one-way classification, 36-39 
Unequal numbers in subclasses 
in three-way nested classification, 
400-403 
worked example for, 411-417 
in two-way nested (hierarchical) 
classification, 362-368 
worked example for, 374-378 
Unequal sample sizes and population 
variances 
in multiple comparisons, 82-84 
Unequal sample sizes per cell 
in two-way crossed classification with 
interaction, 212-227 
in three- and higher-order 
classifications, 311-314 
UNIANOVA procedure, in SPSS, 551, 
566 
Uniformly minimum variance unbiased 
estimator. See UMVU estimator 
Uniformly most accurate (UMA) 
interval, defined, 600 
Uniformly most accurate unbiased 
(UMAVU) interval, defined, 601 
Uniformly shortest length (USL) interval, 
defined, 601 
Univariate analysis of variance (ANOVA) 
models, defined, 8 
Unweighted means, method of, 217-220 
Upper confidence interval, defined, 600 
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USL (uniformly shortest length) interval, 
defined, 601 


VARCOMP procedure, in SPSS, 164, 
252, 333, 451, 557-558 
Variance(s), defined, 587 
error, degrees of freedom in one-way 
classification for, 125 
homogeneity of. See Homogeneity of 
variances 
unequal population, in multiple 
comparisons, 84 
Variance components, defined, 12 
literature on, 580 
confidence intervals for, in one-way 
classification, 31-34 
estimation of. See Point estimation and 
Interval estimation entries 
Variance components model. See 
Random effects model (Model IT), 
Mixed effects model (Model IID) 
VS keyword, in SPSS GLM and 
MANOVA procedures, 554-557 


W test. See Shapiro-Wilk’s W test for 
normality 

Weighted-squares-of-means analysis, 
220-222 

WELCH option, in SAS GLM MEANS 
statement, 566 

Within group sum of squares, 15, 35-36 

WITHIN keyword, in SPSS GLM and 
MANOVA procedures, 553-555 

Worked examples. See under a specific 
model or design 


Youden squares, 520 
Z distribution. See Fisher’s Z distribution 


